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CEHERATION Al.D SEI.ECTIO:/ OF NOVEL BtMOING PROTEINS 
ESACKCKOUND OF THE INVENTION 

- 'nld- of T'^'-"^"^'-"" 

This invention r^iaccs to dcvelop:5ent of nove. 
binding proteins by an ite.-otivc process ot 
autaqenesis, expression, chromatographic .-.election, and 
arplif ication. 

THfomat!"" nisr:l "g-ro ;:tateir.ent 

.The amino acid sequence or" a protein det.er:r.inos 
its three-dinonsional (30) =tructure, which in turn 
dotorai.-.es protein .'ur.ctioning (iTST^J, A:;FI73;. A 
widely accepted syster. of cLassiiyinq protein structure 
r.iv be tcund in Sc^ul. ^nd Sc.-,Ur.er (SCHoTO. Ch5) . 
tr.eii.- classiCication system i= adopted heroin. 

Ehortle (r.HOR35), Sauer ond coUeagues l?AKU55. 
REID33), and Carutr.ers and coUoaques (EISE35) have 
shovn that s=r.o residue:, on the poiypeptido =hain ars. 
.core Lr.porta:.t tr.Ar. others in deter-mi r.g tr.c = 
structure of a protein. The 10 structure is 
essentially unaffected by the identity of the ar.:p.? 
acids at sor.e loci: at other loci only one or a rc-- 
types of ar.inc acid is ..Uou^d. In nost cases, loci 
•.here vide variety is allowed h.->vo the anino acid =ico 
group directed Cova rd the solvent. Loci where linited 
variety i= allowed frequently have the siie grcup 
directed toward other parts of the protein. Thus 
cubstitution.s of ar.ino .^cids tn.>t are exposed to 
solvent are less likely to affect the 3D structure than 
are substitutions at internal loci. (See also SC11U79, 
plSO-Ul and C.=;r.Ic.:. ?23y-:i5. 3i:-ll'i). 



The secondary structure (helices, sheets, turns, 
loops) of a protein is dctemined mostly by local 
sequence. Certain anino acids tend to be correlated 
5 vith certain secondary structures and the ccr'..:r.only used 
Chcu-rasnian (CHOU74, CHOC78a, CHCU7eb) rules depend on 
these correlations. The correl;itions bef-een anino- 
acid type and secondary structure are not, however, 
absolute, and every ariino acid type has been observed 

10 in helices and in both parallel and anciparallel 
sheets. Kabsch" and Sander (KABS84) report on 
pentapeptides ot identical sequence found in different 
proteins; in seme cases the conformations of the 
pentapeptidcs are very different. Arqos (ARCC37) 

15 surveyed pentapeptides of similar scquer.ce ir, different 
proteins and found that the structures of the sequence- 
si r.ilar subsequences vere frequently differenc. 

The r'isidues that join helices to helir.es, helicos 
to sheets, and sheets to sheets arc c.^illod turns and 
loops and have recer.cly been classified by Richaidscn 
(RICIi31), Thornton (THCR33), Sutcliffe et aJ^ (SI;TC37-) 
and others. Insertions and deletions are r-ove readily 
tolerated in loops than clsevhcre. Thornron ct a_U 
■(TKGR3 3) have sur.nurized many cbserva t ior.s indicating 
that related proteins usually differ nost at the leaps 
which join the r.cre rorjui^r olemcn-s oc secondary 
structure. 

30 When the dtnino acid sequence of cne protein has 

been chanqed to be rore lihc the sequence of a sccc.nd 
protein, the properties of the novel procain usually 
approach Che properties of the second protein. Wells 
et ^Ll. (V.'ELLS7a) reported that chanqinq three rc-.iducs 

35 in subtil isin fron ?Mc il - u s arxLoIiS-U^ t'TC ; ens to be the 
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sace as t:^,o cor respond inq residues in subtilisin frora 
3^ I ichon: for-js produced a protease that had nearly 
the sa.T.e activity as the subtilisin trorn the latter 
orqanisn. There were 82 diffcrenjes renaininq in the 
sequences. The three residues changed were chcsen 
because they verc the only ditferences within 7 
Angstrcns (A) oc the active site. 

Many proteins bind non-cova lently but very tightly 
and specifically to somft other characteristic 
noLecules. Schulz and Schirwer sunnarize nany 
observations on the binding of proteins to other 
proteins {SC:;U70, p53-105) . For example, haer.cqlcbin 
alpha chains bind very tightly to haemoglobin beta 
chains (delta C less than -11.0 Kcal/nole) ; antibodies 
bind tightly to antigens (K^s range froin 10"*^ to lO"" 
is the dissociation constant equal to 
^ A » 3 J / [ A : 3 J ) ; basic bcvine pancreatic trypsin 



inhibitor (BrTI) binds tightly to trypsin 



6.C 



10*'^-^ M {TSC!:37), Ciclta G = 
avidin binds to biotin (K;^ = 



-IS.O Kcal/nole); &r.z 



1.3 



10 



-15 



M (CRi 



p362) ) 



In each case che bin'-ing results :ro- 
cc-plo.nentari ty c: the surfacc-s that cone into conwact: 
bur.ps fit into hcles, unlike charges corie together, 
dipoles align. and hydrophobic atons contact ether 
hydrophobic atons. Although bulk water is excluded, 
individual water molecules are frequently found fillinq 
space in i nt = rrnolecular inter f-».ces ; the^e warors . 
usually fcrri hydrogen bonds to one or core a tors cf the 
protein or to other bound water. ' Thus proteins found 
in nature have not attained, nor do tivcy require 
perfect oc = pl ec.en tA r i ty to bind tightly and 
specif ically to their substrates. Only in rare cases 




is trie re -es.'^eiit ia 1 1 y pcrfocc cor.-.p 1 encnta r iry ; then the \y 

blndinq is e>cf.-c.-clv tight C^^ exar.ple, avidir. 

binding to bio tin) . k. 

The relative ir.portor.ce of electrostatic vs. 
hydrophobic interactions is not . cully undorr.tocd |.. 
(SCliUTv; plOSJ . Attraction betvecn oppositely charcod T' 
groaps apparently contributes little Co the frte-energy p. 
of binding between proteins and other molecules. LiV.e- ^ 
charged groups can, however, increase specificity: 
repulsion of like-charged groups in the bindi.-.i j^" 
interf.^ce or even unpaired charges in the interface can 
greatly reduce or eliminate binding in instances -hnre 
shape and hydrophobic interactions would other- Ise 
ir.duce it. 



rt has been cbser/ed, h;:-evcr. that proteins cat. . t 

bind to other noioculc- such t ha t ■ 1 iKu -charged qrcu-j3 V 

:ii'c juxtapcscd; in uucn instances repulsicn is r^;cu::cz y 

or cli-inited by inclusion cf oppositely charged icr.s t- 
in the binding interface. An exii-ple of this 

phenor.enon is the inclusion c: tvo positively charrred j 

calciun ions between e.ich pair of subunits cf turn;.- [ 

crinkle virus (HOGL33). The subunits each contain t-c ^^ 

negatively charged U ( s ing Ic- 1 otter i.Tiir.o acid c-^c-s y 

are given in Table 1) and £ residues in close • 
proxi.-nity. 



The f.ictors affecting protein binding are kncvn. 

(CHOT75. CHOT76, SCH*J79. p08-lC7, and CREIS-;. Ch8), but \ 
designing new co-p 1 c-enta r y surfaces has prcv-i 

difficult. Although so-e rules have boon developed for |r. 

substituting side groups (SLTCSTb) , the side groups cf ( 

proteins arc floppy and it is difficult to predict what { 

conforr:atioh a new side group will take. Further, the [ 

f 
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'forces t^.ac bind proteins to other moUcJlcs arc all 
relatively weak and it is difficult to predict the 
effects of these forces. 

Recently, Quiocho and collaborators (Qlfioa?) 
elucidated the structures of sc-ver^^l po.r iplasmic 
binding proteins front Cra--ncg.'i t ivc bacteria. They 
fcund tnat the proteins, despite having low sequence 
ho.T.ology and differences in structural detail, have 
certain iniportant similarities. Each of the proteins 
they investigated is coir.posed of two do-ains that .ire 
joined by three strands of protein. The binding site 
is located between the two donains and is isolated frc.-n 
bulk solvent. The structure of the binding site is 
dense and highly ordered, and binding constants are 
very high. The researchers suggest that binding of 
ligands causes a ccn format icnai change that alters the 
relative positions of the two donains. 

The researchers found that each of the pcripias-ic 
binding proteins has nur.ercus residues (seven or mere), 
arrayed about the binding site. Surprisingly, icnlc 
ligands are not bound by ionic side groups or opposite 
charge, but by nain-chain cor^ponents. Electrical 
'charge seens to be neutralized by dicoie interactions. 
Further, hydrophobic contacts play an important role m 
binding . 

Based on their investigations of these binding 
proteins, Quiocho et suggest it is unliV.ely tha:i, 

using current protein engineering ncthods, proteins cc^n 
be constructed with bindi.-.g properties superior to 
those of proteins that occur naturally. 



Wilkinson rt ftU (WIL>:34) have fc-nd, however , 
that enzyr-e-substrate affinity -ay be increased by 
protein engineering. They reported that a mutant of 
tyrosyl truu synthetase oC Bacillus stearcchoirmcph i lus 
5 that has proline at residua 51 ex-hibits a 100-foid 
increase in affinity for ATI*. 

Substitution of one aoino acid for another at a 
surface locus may profoundly alter b'.rOing properties 

10 of the protein other than substrate binding, without 
affecting the tertiary structure of the protein. For 
example, in sickle-cell hae.-noglobin the change of the 
surface residue E5 to v in the beta chains causes 
deoxyhacmoglobin-S to forn fibers through self binding 

15 (DICK3J. pl25-145) . Love and others have shcvn that 
the tertiary and quaternary structure of the 
haenoglcbin are not changed CPADL35, WISH75, WI3H76) . 

Tan and Kaiser (TA::K77) and Tschesche et 
20 (TSCK37) showed that changing a single a=ino acid in 
BPTI greatly rriduces its binding to tr-ypsin, but that 
some of the new molecules retain the parental 
characteristics of binding to and inhibiting 
^ chymotrypsin, while others exhibit ncv binding to 
25 elastase. Caruthers and others {Ers=:2=-; hv/e s.^oun 
that changes of single anino acids on the surface of 
the lanbda Cro roprcsscr greatly reduce its affinity 
for the natural operator Or3, but greatly increase tne 
binding of the mutant protein to a mutant operator. 
30 Thus changing th.i surface of a binding protein niay 
alter its specificity without abolishing binding 
activity . 



35 



The recently developed techniques of "reverse 
genetics" have been used to produce single specific 
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nutation:^ at precine base p^ir loci (CLIP86, CLIPS?, 
ar.d AUSU87). t-'.ucations are Qcn'-rJi:/ <letecced ty 
sequencing and in' sonc c^^cs by los3 of wild-cype 
function. These procedures allov researchers to 
analyze the function of each rosidu-. in a protein 
(MCLL831 or of each bdsc pair in a rcquUcory DNA 
sequence (CHEriHS). In chose analycc:S. the norm Mas 
been tn- strive for the. classical goal, of obcaininq 
mutants carrying a sinqle alteration (AV3V57) . 

Reverse genetics is frccr:c^tly applied to coding 
regions to doterntine which residues arc rest inportar.c 
to the protein structure zr.-:i functicr. Ir. such 
studies, isolation of a sin:;le -u'lant at each residue 
or the protein gives an initial esti-ato of vhich 
residues play crucial roles. 

Prior to the nethod of nhe present invention, tvo 
general approaches have beer, dovoiopod to create ncvcl 
r.-Jtant proteins through reverse genetics. Both r.otr-ocs 
start with a clone of the ger.e interest. In cr.e 

approach, dubbed "protein sjr:;erv" (revieved by Dill, 
CDILL37)). a specific subs t ifjt ion is introduced at a 
single protein residue by a synthetic .-cth-;d urUn; rhe 
corresponding natural or synthetic clcr..3d gene. Cra:V. 
'et.oK (Cfo\I35), Roa et jJ^ (P-AOl^G^), and Bash oz 
(BASIi37) have used thii; approach to determine the 
effects on structure ond function of sp-ioific 
substitutions in trypsin. 

The other approach had tocn to generate a v.^riety 
of mutants at many loci vithin the cloned gone, the 
"gene-directed random ru taccnc-s is" -cthcd. The 
specific location and nature ot the change arr- 
detcrmincd by DNA sequencing. It nay be possible to 
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iind cloning of hiqhly degenerate o 1 igon'jc Icot -.cies and 
h-jvii applied saturation mutagencGis to the study of 
pronioter sequence and function. TMcy have suggested 
that similar methods could be used to study genetic 
expression of protein coding regions of genes, but they 
do not say how one should: a) choose procoin residues 
to vary, or b) select or screen nutants w-th dccirncle 
properties. 
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Reidhaar-Olscn and Sauer (REIOSa) ^.ave used 
synthetic degenerate oligo-nts to vary s i-u Lt jr.eously 
two or three residues through all twenty anir.o acids in 
the di=er Interface of cI repressor froa bacter iop^.age 
lambda. They give no discussion of the linLts on how 
Dany residues could be varied nt once ncr do tl\ey 
mention the problem of unequal abundance of DMA 
encoding different anmo acids. They looked lor 
proteins that either hrid wild-type dicerizu t ion or that 
did not dimerize. They did not seek prcteins having 
novel binding properties and did not find n-y. 



Several rc-searchors have d^^signed an.i ^.ynthesized 
proteins dc novo . These designed proteir.c: are sr.all 
and cost have been synthesized in vi trs as po lypept i-ies 
rather than genetically. Outte and coL leagues n.■^ve 
made a polypeptide that bi.-.ds HOT in 5f^ echanol 
{MOSE33) . Recently Moser et aJ^ iKOZl'.Zl ) reported 
genetic expression in coll both of the designed 24 
residue OOT-binding protein and of fusions cf the OOT- 
binding sequence to LacZ . They state that design of 
biologically active proteins is currently : :r.p.%£s ib le . 



Erickson uJL a i ._ (EP.ICS61 have designed and 
syntl:osizcd a series of proteins thdt they have named 
bctabellins, that arc meant to have beta sheets . Tlicy 
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sucjycsc use ot polypeptide 'jyntKesis wic^l nixed 
reaqents to produce ccveral hundred nnalcgous 
botabelllns. They suggest the nixture te paiir.ed over' a 
colu.-nn to recover the analogues with high affinity for 
5 a chosen target compound bound to the column. They 
envision successive rounds of mixed synthesis of 
variant proteins and purification by specific binding. 
They do not discuss how residues; t;hculd be cnosen for 
variation. tjccauso proteins can not be arplifiod, the 
10 researchers nust sequence the recovered protein to 
learn which subs t i tut ior.s improve bindirq. The 
researchers must linit the level of diversity so that 
each variety of protein* will be present in sufficient 
quantity for the isolated fraction to be sequenced. 

15 

A number of methods have been developed to 
separate cells through their affinity to various 
substances. Bonnafous e_t xLi. (flCi.';:3 5) review methods 
that have been applied to anir.al colls, and cite two 

20 cc-r.cn prcblcas: a) non-specific interactions ber.vecn 
cells and affinity supports, and b) irreversible 
binding, of' cells to affinity matrices. Possible 
reasons for irreversible binding include r.uitipie 
points of attach-.ent and very hi'7h affinity between 

25 cells and antibodies used as affinity .ritsrials. 
'chronatograph ic separation of aninal cells is still 
difficult because of their fragility. Sactorial cells, 
bacterial spores, and come fcacter i cpr.ar;o , however, arc 
sturdier than aninal cells and have been f r.ic t icnn ted 

30 based on proteins displayed on their surfaces. 



Ferenci and collaborators have published a series 
of papers on the chromatographic isolation ot mutants 
of the maltose-transport protein Lar.S of co\ \ 

i5 (WArrn79, Fr.I<E30a. FERFSOb, FEREBOc, rE,f.E32a. F£?.ES2b, 
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FZREol, CL'JJt3 4, FE:?.E3 6a, f'LP.f.i! 'vo , : t?S.Z6c, }'F.nZS7h, 
FERHO/b, HEIN87, and HE/.NCS). The papers report tMat 
spontaneous and induced r.utants at the I a - n genetic 
locus can be isolated by ct'.cona tography over a column 
5 supporting Lrnraobilized .-altose, ra L todcxtrins, or 
starch, i.e. carbohydrates thot could cc r.ctabol ized by 
the bacteria. The reports speculate that other 
applications arc possible, but ::peT. i £ ica 1 ly nentior. 
only the elucidation of the rcsid':os r^isponsibie for 
10 the selectivity of the noitodeztrin pore or si.-ilar 
pore proteins. 

Ferenci's experiments -easurod the conbination of 
the individual affinity of nutant Lar.a nolocules and 

15 the level of expression. Several classes of mutants in 
lanS were isolated. Ono clas-.s had higher affinities 
for both maltose and starch. one class had lever 
affinity for starch but higher affinity for saltose, 
and another class had higher affinity for starch but 

20 lover affinity for maltose. 

Mutants '-ere generated eirhor by hyd rcxy 1 ari no 
treatment of a plasmid carrying ^he end re ger.c-, or by 
insertions of two extra codons at natural Hp.^ tl sites. 
25 Levels of nutagenesis were pic/ced Co provide r-ingir; 
point nutations or single insertions of two L-'.'s:.d':es . 
No niultiple nutations were sought or fnunu. 

La;nB is a Large trir.eric integral r.enbrane 
30 protein; such proteins are very difficult to 
crystallize or even to solubilizc. Therefore: it is 
difficult to use single-crystal protein X-ray 
crystallography or tiMR to obtain detailed 3D structural 
infornation. Caravito ct a I . (CA.^.A33) have obtained 
35 crystals of I.ar.n thit diffract X-rays, but the 3D 
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ctructui't: of Mie protein noc y*;*: t'j^n cloC.* r;-. i ned . 

T^.cre ;Trc mc.-i«lG (L.EliP.3 7, liCiron) t.>..jc incltidc the 
secondary ccructure of Lar.B, i.e. cr-»cy specify uhich 
residues arc in bota-chcct con f ornac icn , unich residues 
are in turn:;, and wnich residues are cn the outside, in 
the peripijcm, or in the nenbrane. These r.odols do not 
specify how the beta shocts are arranged nor vhich 
turns are close to which other turns. 

Fcrcnci and Loe (K^nEBfia) rc ported on the 
temperature sensitivity of ca rbohyd r^- to blndinq in EL. 
r.^:oaroth Gr.POiihJJLus. At higher tonpera tures , the 

organism breaks down the polysaccharide, the binding of 
which was the object of the study. Clune, Lon, and 
Fercnci (CUJNfi4) reported that presence of conplete 0- 
antiqen affected the binding properties of Lar.a on the 
surface of col i . Both of these reports point up the 
difficulties of working with live bacteria that can 
r.ctabcLize chcnicals and change treir physiological 
behavior during the chronatographic exp-*rinent. Heine 
5_t a I . (HFIt:.:3) have used the chi-no-ax is of ^-Al 
recently to isolate mutants in l arH that arc unaffected 
in chenotaxis: thi? approach is li-itod to notaiolitcd 
that affect cnecotaxis . 

Makela rrt n]^ (:i.\KrS0) revice-i r.ntl-.'jds that 
involve chemically coupling antigens to boct -?r icphage 
to produce a sr.nsitivc, quantitative detect io:. svsten 
for antibodies. The pethods r^vic-od e/:picit the 
ability to ar.pl ify the signal prol'jccd by antibodies 
binding to the antigens coupled to tr.e phage, through 
growth of the phage. The antigens vera joined to the 
phage chcnically and not encoded i..-. the gene:; of the 
phage. Thus there was no sorting c£ genetic r.aterial. 
Furtherrnorc, the objectives ol* the r.ethods reviewed 
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involve titoring the phage that fail to bind, as an 
assay of antibody. The tnethods of the present 
invention, in nost cases, involve grovth and 
ar.plification of genetic -packages that bind ,vi-.h Uign 
af cinity . 

In 1935 Smith (GMIT85) reported inserting a 
heterologous gene into gene XII of bacteriophage fl. 
The gene III protein is a ninor coat protein necessary 
for infectivity. ' In sor.e cases the inserted gene 
preserved the original reading frame, leading to 
expression of heterologcur. protein as an inserted 
domain in the gene III protein. S:=ith demonstrated 
thac'the resulting strain of fl vision are adsorbed by 
antibody against the protein encoded by the 
heterologous DN'A . The antibody vas bound to a 
polystyrene petri dish. The phage were cluted at pM 
2.2 ar.d retained sc.T.e inf c-ctivity . However, tne single 
copy of M gene US -'^s used for insertion cf the 
heterologous gene so that all copies of gc.ne III 
protein were affected; infectivity of the resultant 
phage was reduced 25-fold. Smith also de-^cnstraLcd 
■ that batch elution frorr. a niate can separate fl virions 
that differ by only a few protein dor.ains on their 
surfaces. 



Smith presented his method as a vay to isolate 
cloned genes using antibodies to the gene products. He 
made no mention of mutagenising the ' inserted genetic 
30 material or of inducing novel binding properties in the 
inserted protein domain. 

De la Cru2 et aK. (CRUZ88) have expressed a 
fragment of the repeat region of the circumsporozoite 



35 protein from P lrTsnoci ium f.^ Iclnarum on 



Lhe surface of 
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M13 as an insorc in the qone III proccin. They shoved 
that the reccrisinant phage were both antigenic and 
ir.nunogenic in rabbits, and that such recoriinant phage 
couLd be used for B epitope mapping. Tho researchei's 
suggest that sir.iiar reco-binant ph.^.ge could be used 
for T epitope r.apping and for vacci.ne dcve Icpr.ent . 
They do not suggest mutagenesis of the inserted 
r.ateri:s 1 . 

Gene fragr.ents coding for portions of hepatitis B 
virus antigens have been fused to fr^gnents of iariB. 
If Che point of fusion is in a region coding for 
e>:posed domains of LanE, the HDV antigens appear cn the 
•cell surface and are inr.unogenic (CtL^.R37). Charbit ^ 
ai. faiAH37) suggest use of those engineered strains 
for ccveiopr.er.t of a liv€: bacterial v.^ccine; they have 
not reported interest in r.-tagoresis of the f'^'-od 
heterologous gerc fragr'.onts. nor in cevelcpnent of 
binding capabilities. 

Recently Tjian and colleagues (K03A35. b:^:Gc7, and 
JO::£37) have shovn that DNA of definite sequence bouivd 
to' an affinity c.-lunn can be used to purify proteins 
that bind the c::a sequcnco-soec i f ica I ly . The prctcins 
are purified as nuch as lOCO-rold in two 
cnrcr.atographic i^tops or S3-fold in a single st^p. 

Patents and patent applications vhich r.jy be cl 
interest inclu-^.e US Patent t-'o . .,T0;, 692. "Ccnoutcr 
Based L^ystcn and Method for Dcternining and Displaying 
Possible Chonica: Structures for Converting Cc-=le- or 
Multiple-Chain Polypaptidos to Single-Chain 
PolypoptidoG" (Lcdner '692), issued to Pobert Charles 
Ladner on 3 Noverr-ber 1037 and assigned to Cenex: 
Corporation. r.aJncc '692 describes a design rethod for 
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convert inq proteins composed cf t-'O or rnore chains into 
proteins of Cower polypeptide ch.2ins, but with 
essentially the sane 3D structure. . There is no 

mention of variegated Dti^ and no qonetic selection. 

Hobcrt Charles Ladnor also has si>c p.^ter.t 
applications pending before t^e USPTO and assiqr.dd to 
Cenex Corporation: 

07/92, 110 
07/21,046 
07/21,047 
07/34,964 
07/34,965 
07/34,966 

-Sonia K. Cute man is n.-'-.-ned as a joint inventor on 
US patent No. 4,745, 056 ( "3 1 roptcr.yces Secretion 
Vector'*) a.-id on Scr, No. 21, 4 65. 



20 None of the Ladncr or Guterr.an patents or 

applications is believed to disclose or suq-^est the 
present invention, but it is rcq-jcsted that each be 
- considered by the Examiner. 

25 So admission is made that any cited referenco is 

prior art or pertinent prior art, and the dates qivon 
are those appearing cn the reference and r..-\y not be 
identical to the actual publication date. 

30 SUM.*1-\RY OK THK INVENTION 

This invention relates to the construction, 
exrprcr.i; ion, and selection of r.utated genes that specify 
novel proteins with desirable binding properties, as 
35 well as these proteins thcn.'sol vcs . The substances 
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bound b/ Che-:e proc^ins, hereinafcer referred rc js [ 
"targets", nay be, but need not be, procoins, T:iL-gct3 ^ 
may include other biological or synthetic 
riacronolccules a^ veil os organic and ir.crgonic 
coicculos. 

I. 



In one e - l:-':d i;-ent . z'a-z invention ;:clur.es tc: 

a) prepari:*.g m varioajced population of rcplicable 
genetic p.^.C/:ages, each pac/:aqe including a nucl'^ic 
acid ccr.L:truct ceding on ex-prcssicn tor an outer- 
surf aoe-c i s p 1 a ycd pocontial binding protein 
comprising (i) a structural signal directing the 
display of the protein on the outer surface of thi! 
package and (ii) a potential binding domain for 
binding said target, where a plurality of 
different poti^ntial binding do-ains are displayed 
by the individual packages; 



The ncvel binding proteins r.ay be obtained: i) by 
mutating a gene encoding a known binding protein within 
the subsequence encoding a known binding dor.ain, or 2) ^Bf\ 
by taking such a subsequence of the gene for a first 
protein and ccrJaining it with all or part of a gene for 
a second prctein (which r.ay or rr^ay not be itself a rj-; 
).:novn binding prccoin), J) by nutating a gene encoding 



a protein which, while not posse s:;ing a >:nown binding f;^^ 
15 activity, possesses a sRCondary or higher structure 
that lends itscl: to binding activity (clefts, grooves, 

etc . i , or 4) by r.uta ting a gene encoding a kn-riwn p, 

b ndinc c rote in out not in the subsequence kncvn to l'.-^ 

cause the .binding. Tr.e protein frc- vhicn tha novel •- 

20 binding protein is derived- need not have any specific J: 

affinitv for the target natcrial. J.-; 
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The invert icn Cur-hcr relates to a zet^-id of 
preparing & nixed peculation of repli.c.^ble genotlc 
packages in ■ vhich each package incluties • i gene 
expressing a potential binding protein in such a -anr.er 
that the "protein is presented on the outer surface of 
tho paci:agc. This r.ethcd conpr:r.cs: 

i) preparing a variegated pc-puiaticr. ot CU\ 
inserts of each of which comprises a first 
sequence -hich cedes on expression tor a pctential 
binding dooain and, a second sequence encoding 
signal directing that the encoded protein be 
displayed on the outer surface of d chr-sen 
replicable qcnotic packagr-i, an--: 

ii) incorporating the resulting popuiir.icn ct D:.*A 
constructs into the chosen replicable gonetic 
packages to prcJuce a pcp.ilation of rc-plicablc 
genetic pr.cxaqes. 

In a preferred e-bodi-cnt, the potent ia ir.i in;- 
procein-encodir.g incierts are incorporated i.ito e cer.e 
encoding an cuter-surface protein of t::c replicjblc- 
genetic pacJiace, 

The invention enucr. passes tiie design and synthesis: 
of variegated D::A 'Encoding a far.ily o: pctenti-^i 
binding proteins characterized by constant ar^d variable 
regions, said proteins being designed vith a vicv 
tovard obtaining a protein that binds a pre- ie terri incd 
target. 

For the purposes of this invention, the ter^i 
"potential binding prctoin" refers to a protein encoded 
by one species of DN'A -olccule in a population of 
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The invenwicn fur-hcr reLatos to a zec^-id o!! 
preparing a r.ixed population of repli.cable qenotlc 
packages in- vhich each pocKaqe includes a gene 
■ expressing a pocencial binding prccein in such a r.anr.er 
5 what the 'protein is presented on the outer surface ot 
the pac}:agc. This r.ethcd coDpr:r.cs: 

i) preparing d variogated prpulaticr. CNA 
inserts of each of which cor.prises a first 
10 sequence which codes on expression cor a pctential 

binding donain and, a second sequence encoding 
signaL directing that the encoded protein to 
displayed on the outer surface of a cr^.r.FiQr^ 
replicable genetic package, und 
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ii) incorporating the resulting populir.icn ct* D::a 
constructs into the chosen rcpliciible genetic 
packages to produce a population of rscLicablc 
genetic pftckages. 

In a preferred er.bodi-cnt, the pc.tent i :i ; -= i in-g- 
prctein-encodir.g inserts are incorporated into a cen€ 
encoding an cuter-surf :icc protein of t:;c replicjblc- 
genetic package. 



25 - 



30 



35 



The invention enuor. passes tl;e design and synthnsvi 
of varieqatei ONA '-needing a far.ily of pctanti-^l 
binding proteins characterized by constant a.-e v^ri^ble 
regions, said proteins being designed vith a vicv 
tovard obtaining a protein that binds a prc-ieteminod 
target. 

For the purposes of this invention, the ter:?. 
"potential binding protein" refers to a protein encoded 
by one species of Cr:A r.olccule in a population of 
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proteins th.it in fact bind to the target ( *'succcs.<; f u L 
binding donainc"). Atter one or more rounds of such 
enrichnenc, one or nore of the chosen genes arc 
exaniincd and :,equcnco:d. If desired, nev loci of 
variation are chosen. The i-.clected daughter genes of 
one generation then boccrio the parent sequences for the 
r.ext generation of var icgj toU DI^A, beginning the next 
'•variegation cycle." Such cycles are continued until a 
protein with the dasired targot affinity is obtained. 

The appended clains are hereby incorporated by 
reference into this specification as an enu:nera t ion. of 
the preferred er-bod inen ts . 

BRIEF UF.rJCiaPTION OF TilE DKAWTNCG 

Figure 1 is a rci^.crat ic shoving the relationships 
between various types of Binding Domains (3D). 

Fiquro 2 is a flow ch'irt showing tlie najor steps uced 
to create a rioval protein with affinity for a pre- 
determined target. 

Figure 3 is a stereo view of a molecular rode! of thr> 
-coat of the bacteriophage fl. 

Figure 4 is a schtip./itic ot a PCD contacting a .aoleculs 
of target ratcrial. 

Figure 5 is a stereo view of a hypothetical interaction 
between BPTI and nycqlobin. 

Figure 6 is a schcir.atic of tho binding surface of a PliD 
at various stages in the process of selecting a 
successful binding dor.ain for a hypothetical target. 



proteins th.it in fact bind to the target ("successful 
binding doniainc'*) . Atter one or more rounds of such 
enrichment, one or nore of the chosen genes arc 
examined and ^^equcnc^d. If desired. new -loci cf 
variation are chosen. The selected d^iughtcr genes of 
one generation thon boccnc the parent setjuences for the 
next generation of va r icq j f.r-J DtiA, beginning the next 
"variegation cycle." Such cycles are continued until a 
protein with the desired t.irgot affinity is obtained. 

The appended clains are hereby incorporated by 
reference into this specification as an enu.-nerat ion of 
the preferred er-bod inents . 

BRIEF uF-scRiprrott OF TiiE ncAwriics 

Figure 1 is a scheratic shewing the relationships 
between various types of Binding Domains (BD). 

Figure 2 is a flow ch-jrt showing tlie major steps urcd 
CO create -a novel protein with affinity for a pre- 
determined target. 

Fioure 3 is a stereo view of a molecular model of th^2 
coat of the bacteriophage fl. 

Figure 4 is a schematic ot a PCO contacting a molecule 
of target material- 

Figure 5 is a stereo view of a hypothetical interaction 
between BPTI and myoglobin. 

Figure 6 is a schcir.atic cf tho binding surface cf a PL-D 
at various stages in the process of selecting a 
successful binding domain for a hypothetical target. 
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Eaczorial CcIIl: as Genetic PacKaqo'j 
Preferred (:ac::crial Cells for Ui;c ac CPs 
Preferrcti Outer Surface Proteins 
Dvsplayinq I :-eOs on Hacccrial Cells 
Choice of Insertion site for . IPOD in 
Bacceri-ii C- 1 1 OSP 

In Vivo liolGCticn tor P::eudo 03 P Cane t ror. 
Rar:-J-j.T l)t:/^ Lrserts in Bacterial Ceilf. 

Displaying IPSO .on bacterial spores 
Preferred Bacteria 1 . Spores for Use as CPs 
Preferred Outer-Surface Proteins for 
Displaying IPSO on U.icterial Spores 
Cncice of insertion site for IPDD in O:^? 
In Vivo iJolcction for Pseu.io OSP Ceno iro= 
I D/IA Ir.serts in Bacterid Spores 



Display in-j I PHD on Outer Surface ot Pha<50S 

Pretftrrc'i Ph=iqes tcr 'iiia as CPs 

Preferred cr:-Ps fcr Oi:;playing IPDOs on PJ.aqos 

Crioice of Ir.se rt ion site for I PHD Lr. CS? 

.Ill Vivo 5^^f :"-ti cn for Pseudo-OS? Cane fror. 

Randcn L!::a I-:;crts in Pnacjes 

Choi CO of ir-r:D 

In:l;ionco of t.irqct size on choice of 
Influc-ncr: ot tarqct charge on choice of IPtir: 
Other conr.idoratLons in the choice of IPbO 

Choice ot CCV 

{Jtrsiqnin-.i the c<c-_ inh-d gene Insert 
Ccnctic r',";'jlat ion of the osp-i ph^j geno 
n.'.'A 3eq*Jonce design 
.'Specific r.:;A scque.ice assignment 
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1.2. 
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1.2,4 



Eaczorial Ccil-z as Genetic PacKaqo'j 
Preferred iMCtcrial Cells for U-jc as CPs 
Preferred Ou^er Surface Proteins for 
Di.splayinq ir-CD.s on tlacterial Colls 
Choice of Insertion sice for .IPSO in 
Bacterial! Cell OSF 

In vivo LioLeccicn tor ?:3eudo 05 P Cenc : ror. 
P.ur:-J-j.-:i OriA Inserts in Dae te rial Ceilr 

Displaying I?5D on bacterial spores 

Preferred Bacterial . Spores for Use as CPs 

Preferred o u t e r - u r f a c e Proteins for 

Displaying IPBD on Ei.-icterial Spores 

Choice of Insertion site for IPDD in OS? 

In Vivo lielcction for Pseudo OSP Gene tro= 

Randcin OUA Ir.serts in Bacteriel Spores 
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1 . 3 
1.3.1 
1.3.2 
1.3.3 
1 . 3 . H 



2 . 

2.1.1 
2.1.2 
2.1.3 

3 . 

4 . 
•I. 1 
4 . 2 
4 . 3 



OispLdvln-i IPBD on Outer Surface of Phages 

Preiftrre'J Ph='.qes for ii:;e as CPs 

Preferred IDPs for Displaying IPQDs on VLa^os 

Choice of Insertion site for IrBO In OS? 

In V'.vo -S*"? 1 ^"t ion for Pseudo-Oj;? Gene f rc- 

Rando.i I. -sorts in Pnaqes 

Choice of ir-2D 

In:l;ioncc ot target size on choice of rP2D 
Influence: ol target charge on choice of irilC 
Other ccinr. idoraticns in the choice of I?bO 

Choice of CCV 

f;esiqning t^.e c<c- i nM qenfi Insert 
Cc.TCtic r'.";'jlat ion of the gsp-jnM gene 
":.'A sequence design 
.':pccific C.'.'A sequence assignment 
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13.1.2 
13.1.3 
13.2 

13 . 3 

14 . 1 
1< . 2 
14 . 3 

15. 

15.1 
15.2 

15. 3 
15.4 
IS. 5 
15.6 

15.7 
15.8 
15.9 

16.0 



The .StrcondaL-y ::ct 

Cncice of Rc:;iduc= to Vary Initially 

Choosincj ranqe of variation 

Design of vq Dt<'A Encoding l'J*D FAniiy 

Insertion of cynthetic vgDNA into plo 

Transformation of coils ■ 

Growth of tho CP(vfjr'DD) population 
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17. 

17. 

17. 

17 

17 
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Isolation of crfGtiD)s vith b ind inq-to-ta rqoc 
phenotyper. 

Attaching tMc targ^.t materidi to a column 
Reducing selection due to non-specific 
binding 

Eluting the colunn 

Recovery of pacJ^aqcs 

Ar.plifyinq the enriched packages 

Determining whether further cnricncnont i.': 

needed 

Characteri;: inq population 
Testing of d ind ing . a f f inity 
Other Affinity Separation ."leans 

The. Next Variegation Cycle 

OTHER coh'SXCER.\T:o::s 
Joint seioctio.TS 
Selection for non-binding 

Selection of P3Ds for retention of structure 
Created binding proteins not uniqv.- 
Other nodes of nutaqenosis possible 
Derivation of ::ovcl Binding Protein lor 
Myoglobin Using m^l as IPBD, tin as CP, an-J 
the Gone VI 1 1 protein as OSP. 
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13.1- 

13..1. 

13.2 

13.3 

14 . 1 

14.2 

14 . 3 

15. 

15.1 
15.2 

15.3 
15.4 
15.5 
15.6 



15,7 
20 15.8 
15.9 
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Thfi Secondary .'Jet 

Cncics of KcrUducG to Var/ Initially 

Choosint; runqe of variation 

Dosicjn of vq DfJA Encoding i'i^D Family 

Insertion of Gynthetic vgDNA into plasrnids 

Transformation of cells ■ 

Growth of' the CP(vfjpOD) popolotion 

Isolation of crfGtiO)s with b ind inq-to-ta r'-jr^c 
phenotypers 

Attaching tho targ'it material to o column 
Reducing selection due to non-specific 
b indinq 

Eluting the colunn 

Recovery of paclcagcs 

Ar.pl if ying the enriched packages 

Determining whether further cr.ricnment ir. 

needed 

Characteri/.inq population 
Testing of oinding affinity 
Other Affinity Separation :<eans 

The t.'ext Variegation Cycie 



2 5 17.0 OT!!ER COr;S I DER.\T I ONS 

, 17.1 Joint selections 

17.2 Selection for non-bindinq 

17.3 Selection of raOs for retention of structure 

17.4 Created binding proteins net uniqv--^ 
30 17.5 Other nodes of mutagenesis possible 

Example 1 Derivation of :.*ovcl Qindinq Protein lor 
Myoglobin Using BITI as IPBD, M13 as CP, -ir-J 
the Gene VII : F-'rotein as OSP. 



bind a chosen cargeC. it is referred to heroin as a 
"bindinq domain" (BO). A preliminary operation is to 
engineer -the appearance of a stable protein domain, 
denoted as an "initial potential binding^' do-.Tiain" 
5 (IPOO). on the surface of a genetic packagn. The 
present invention is concerned vith tMc express ion of 
numerous, diverse, variant "potential binding donains" 
(PBO), all related to a "parental potential binding 
domain" (P?BD) such as the bindinq do^iain of a known 

10 binding protein, and with selection and a-p 1 i f icat ion 
of the genes encoding the r.oct succe-::3 :u I rr.utant TDCs. 
An IPDD is chosen as PP3n to the tirst round of 
variegation. Sc lect ion-through-b : nd ing isolates one or 
more "successful binding doma ins" • (SGO) . An SBD fro:?. 

15 one round' of variegation and select ion-through-bind ing 
is chosen to bo the PPDD for the next round. Tho 
invention is not, however, li.-nited to prctcir.s vith 
single liO since the method nay he applitd to mv{ oc all 
of the QDs of the protein, ijocLient i a I I y cr 

20 simultaneously. The relationships oC t^.e various BCs 
are illustrated in Figure 1. 

Conventionally, CNA sequences ar- --ritt-r^n from z' 
to" 3', left-to-right £ho--inq only the scque-.ici that 
25 will appear as nRNA (with each T of DNA chanqcJ to U m 
[T.R.^JA) . 

protein: H - - F - 

30 anti-sensG ONA: 5' ATG CTT TTC ... 3| 

sense DN.\: 3' TAC C^J-. />-AC ... 5' 

nRJJA: 5' AUG CfJ UUC ... 3' 

The cotr.pler.ontary strand is the one ujcd as tcr.piate 
for p.P.t;A synthesis and so is called the "scn::e strand"; 
we will use this convention throughout. Although th:s 
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bind a chosen target . it is referred to herein as a 
-binding donain'* (BD). A prelimin.^ry operation is to 
engineer -the appearance of a stable protein domain, 
denoted as an "initial potential binding do-.Tiain" 
5 (IPDO), on Che surface of a genetic package. The 
present invention is concerned vith the oxpror-sion of 
numerous, diverse, variant "potential binding uon-iins" 
(PBO), all related to a "parental poccntial binding 
domain" (PPBD) such as the binding do.nain of a knovn 

10 binding protein, and with selection and ar.p I i f icat ion 
of the genes encoding the cost succe-^stul -utant TDCs. 
An IPDO is chosen as PPBD to the tirst round of 
variegation. Selection-through-binding isolates one or 
more -successful binding dona ins" ■ (5D0) . An 5B0 fror. 

15 one round" of variegation and select ion-through-b ind ing 
is chosen to bo the PPBD for the nox- .round. The 
invention is not, however, liriiccd to proteins with a 
single UD since the method r.ay be ?ipplic:cL to any c.r ail 
of the BDs of the protein, •jocuentially cr 

20 simultaneously. The relationships ol the various BDs 
are illustrated in Figure 1. 

conventionally, CMA sequences ar-i writt-rn from =* 
to' 3', left-to-right showing cniy the soqucwe th.*t 
25 will appear as n.RNA (with each T of DMA ch.-«.nriuJ to U in 
rr.RNA) . 

protein: M - L - F - 

30 anti-sense DMA: 5' ATC CTT TTC ... V 

sense D:**A: 3' TAG CA>. /VAG ... a 

mRNA: 5' AUG CUL' UUC ... 3 ' 

The co:r.plctr.cntary strand is the one u;:cd as tcnpiate 
for p.P.i:a synthesis and so is called the "sen:;o strand"; 
vc will use this convention throughout. Although this 



the an.Uyte can be freed fro.- the iifCinity- nateriai 
once the icpurities arc --fished avay. 



Affinity colur.n c^.:-o.-.^ toqraphy involves chenically 
attaching tno aiCinity r.atcrial to inert solid 

support rcitrix tnat iis hc-d in a ccli:an so that 
solutions can be passed over the r.atrix in a control Led 
way. Mixtures that night contain the analyte are 
passed over the natrix to '.hich any analyte conpcnent 
in the nixturc adheres. Separation is achieved by 
passing a gradient o: sere type over the natrix and 
collecting fractions. It is- also possible to recover 
purified naterial fr:r. the -atrix by other neans aiter 
iiipurities have been -•jshcd avay. 

An alternative to coiur.n affinity chronatography 
is batch eluticn frcr: an aifinity r.atrix catcrial held 
•n sorie container. .^.rfinity r.aterial is chanically 
bound to the .-atrix. A r.ixfjre that night contain che 
analyte is a=dcd an- -ho r-^trix i- rinred vith buffer. 
The material is rinsed with a series of buffers 
containing increasir.r concentrations of solutes chcsen 
to wash i.T.?urities avay. The analyte is recovered in 
purified fcr3 either in cne cf the buffer fractions or 
bound to the Tiatrix. 

Another alternative to cclur.n affinity chronaccf;- 
raphy is catch cli:ti3n fro.- a plate. The affinity 
material can be che.-ically bound to a flat surface, 
such as the cotter, cf a polystyrene petri dish. A 
fixture th.-.t .-ight contain the analyte is aadcd to the 
plate and the plate is rinsed with a buffer. 
Subsequently, the plate is vashed with a series of 
buffers ccntaininq increasing cc.ncentrat ions of solutes 
chosen to separate corpcnent:; having lower affinity fcr 
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r.he an^lyte can be freed tccr. Che affinity rjatcridL 
once the impurities arc- vac^.ccl avay. 

Affinity colur.n chro-^j toqraphy involves chemicoUy 
attaching tno affinity r.otcrial to 4n inert solid 
support r.Ttrix that i-J held in a cclunn so that 
solutions can be passed over the r.atrix in a controlled 
vay. Mixtures vhat aiqht contain the analyte are 
passed over the natrix* to -hich any ana lyte conponenc 
in the nixturc adher^^s. Separation is achieved by 
passing a gradient o:' sc-e type ovor the natrix and 
collecting fractions. :t is also possible to recover 
purified material trzr. the r.itr;x by other r^eans aiter 
inipurities have been -ichcd a*- ay. 

An alternative to coiu.-n affinity chrona tography 
is batch elution frrr: i' affinity -acrix material held 
in sone container. Affinity material is chemically 
bound to the -atrix. A nixfjre that night contain the 
analyte is iidcd ar.u the r.atrix ic rinsed vith buffer. 
The r:acerial is rinsed with a suries of buffers 
containing increasing concentrations of solutes chcsen 
za wash i-p'jrities avay. The anaiyte Is recovered in 
purified fcm either in one of the buffer fractions or 
bound to the natrix. 

Another alterr.jtivo to cclu-n affinity chronatc^*- 
raphy is L-atch cU:ti3n fro- a plate. The rtftinity 
=:aterial can be cherically bound to a flat surface, 
such as the cottc.- cf a polystyrene pfctri dish. A 
inixture th.*.t night contain the analyte is aadcd to the 
plate and the plate is rinsed with a buffer. 
Subsequently, the plate is --ashed with a series of 
buffers containing increasing ccncentrat ions of solutes 
chosen to separate co.-pcnents h.^vir.g lowor affinity fcr 
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or cells. It has been used to separate bacieriopria'jes 
on the basis of charge. (SERW37) . 

The present invention naJces use of affinity 
5 separation of bacterial cells, or bacterial viruses (or 
other qcnctic packages) to enrich a population for 
those cells or viruses carrying genes thai code foe 
proteins with desirable binding properties. 
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In the present invention, the vords "grow", 
"growth", "culture", and "amplification" r.ean increase 
in number, not increase in size of ind iv Ldua L cells or 
phage. In the present invention, the words "select" 
and "selection" are used in the genetic ser.se; i^_e^ a 
biological process whereby a phenotypic characteristic 
is used to enrich a population for those organisms 
dicplaying the desired phenotype. Choices or elections 
to be made by nutans are indicated by "choose", "pic>:", 
" take" etc . , but not "select". 



The process of the present 
three major parts: 



invention comprises 



I. design and production of a replicablc 
genetic u.icKage {CP) that displays an Il-'SO on 
the surface of the CP; the comb ina c icn is 
denoted CP(IE'EO) . 

II. design and ir-.pl omen tat ion of an affinity 
separation process that separates C?(:pbD)s 
that bind to a known affinity molecule frcn 
wild-typo CPs or C?(irDD')s, neither of which 
binds the known affinity nolecule. and 
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or cells. It has been used to separate bacter iophd'jes 
on the basis of charge. (SERWa?) . 

The present invention nakes use of affinity 
5 separation of bacterial cells, or bacterial virvfises (or 
other qcnetic packages) to enrich a population for 
those cells or viruses carrying genes r.ha- code for 
proteins with desirable binding properties. 

10 In the present invention, the words "grow", 

"growth", "culture", and "amplification" r.ean increase 
in nutnber, not increase in size of individual ceils or 
phage. In the present invention, the vords "select" 
and "selection" are used in the genetic ser.se; a 

15 biological process whereby a phenctypic characteristic 
is "used to enrich a population for those organises 
displaying the desired phenotype. Choices or elections 
to he made by humans are indicated by ''choose", "picS:", 
"take", etc., but not "select". 
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The proc'-:ss of the present inven-ior. comprises 
three major parts: 

I. design and production of a i-oplicible 
25 genetic package (CP) that displays an on 

the surface of tr.c CP; the combination is 
denoted CP( XPEO) , 

II. design and ir.p I omenta t ion of an affinity 
30 separation process that separates C?{I?DD)s 

that bind to a known affinity molecule from 
wild-type GPS or CP(IPDD")s, neither of which 
binds the known affinity nolecule, and 
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3) designing an ar.ino acid nec-jcnc- chat: a) 
includes the IPED as a subsequence and b) uill 
cause the IPOO to appejr cn the CP surface (Sees. 
1.1.2, 1.2.2, 1.3.2, and 4) , 

4/ eng-. nee ring a gen'--, donated "^-.ll^iSi.^^.. tliac: a) 
codes for t^.e designed ani.-o ac:J -j^jq-jcncc , b) 
provides the necessary genetic regulation, and c) 
introduces convenient sites for genetic 
manipulation (Sees. 4.1, 4.2, 4.3, and 5.2), 

5) cloning the c3D-i rbd cere incc ihe Cf* (Sec. 
6.1), and 

6) har-.'esting the transfcrr:ed CPs (Sec. 7) and 
testing then for presence of ir-3D on the G? 
surface (Sec. 3); this test is porforr.ed Jith in 
affinity r.oiecule having high af::n:ty Ccr IPED, 
denoted AfK( IPBD) . 



In another preferred e-.bodlrent, 
involves: 



r-2rt 



c: the process 



1) choosing a C? such o3 a bacterir>L cell (Sec. 
1.1.1), bacterial spore fl.2.1), or phage (l.J-l) 
having a suitable cucer surface protein (i^ecs. 
1.1.2, 1.2.2 and 1. 3.2j , 

2) choosing a stable IJ-CD (Sec. ) , 

3) designing a D::a sequence th-^t: a) encodes the 
IP DO as a subsequence and b) cor. -a ins suitable 
restriction sites so that randor\ ONA n-iy be 
operably linked to the :r=.:d gone frag-ont: and c) 
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3) designing an ar.ino aci'i r^ec-jcr.c- that: a) 
includes the IPBD as a subsequence ^nd b) will 
couse the IPOO to appear cn the CP surface (Sees. 
1.1.2, 1.2.2, 1.3.2, ctnd 4) . 

4; engineering a gen-, donctc'J o-.iiri^Pi. tluit: a) 
codes r"or c^-.e dosignc-d d.ii-o acid *j-c*jcncc, b) 
provides the necessary genetic rcgulJt;on, and c) 
introduces convenient sites for genetic 
nanipulation (Sees. 4.1, 4.2, 4.3, ^.1. and 5.2), 

5) cloning the c3P-irbd ger.e incc the CF' (Sec. 
6.1), and 

6) har^'esting tne t rans f crr-ed CPs (Sec. 7) and 
testing theri for presence of IPBD on the G? 
surface (Sec. 3); t.Ms test is portcrr-.ed -^it^ ^^n 
aff:."it/ r.olecule having .Mgri af:;ni-,y Ccr rP5D, 
denoted Af:-:( IP30) . 

In anotr.er preJerrod er. bod : rent . Port : c: t^.c process 
involves : 

1) c.noosing a C? such o3 a bactori^^l cell (Sec. 
1.1.1), bacterial cp-cre fl.2.1). p^.age (l.J.l) 
having a suitable outer surface prcteLn (Socs. 
1.1.2, 1.2.2 and 1.3.2}, 

2) choosing a stable l?t:D (Sec. 1), 

3) designing a ONA sequence th-^t: al encodes the 
IPDO as a subsequence and b) cor.:?iir.s suitable 
restriction sites so t.-.at ranoca UN'A n-v/ be 
operably linked to the ir'cd gone frag-ont; and c) 
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2) preparing C?(IPBO;s uich various amounts of 
IPDD per G?, 

3) pic>:iiig a gr.-\6ien- rccirr.e for Gli:ring -r.c 
coLurr.ns (Sec. 10.1), 

4) dotermining ■^•hich ccr.i ir.atiion of: a) IPBD/C-P, 
b) density ot Af X ( 1 :-*DD) / (volunc of support), c] 
ip.iti.-\l ionic strength, d) elution rate, and e) 
(aiT.cunt of CP}/(vclur.e of support) loaded, givrs 
the best reparation cf C?(I?BD) frcr. wtCP (Sec. 
10. 1) , 

5} determining the sr.allest anount of GP(I?tO) 
that can be isolated f roa a r.uch larger amount of 
wtCP using tl-.e optir.al ccrditicn, (Sec. 10.2), and 

6) detornining the cfficisnoy of the affinity 
separation procedure (Sec. 10.3). 




domain. References to PbD cr P^^^ ^ 

indicate a preparatory intent. 



In Part II we cpciai:o separation of G?(IP30) frc.-:: 
wild-type CP, denoCed wtCP, based on the affinity of 
IPtlO for AfM(irDO). To cstibi-sh the scnsit.ivity of 
the affinity separation process, we separate srtalL 
aaounts of GP{II'BlJ) fro.-a r.ucn larger anouncs of wcGP. 
In a preferred embodiment, P^.rt II of the process of 
Che present invention involves: 

1) preparing affinity coicr.ns bearing AfM(IPBD; at 
various densities of Af ( I ?30) / ( volu::;e of matrix), 
(Sec. 10.1), 
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don.Tin. References to Pbn cr rl'd in Part I dre co 
indicate a preparatory indent. 
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In Part II wc cptinizc separation of G?(1PBD) frcr: 
wild-type CP, denoted wtCP, b-:ised on the affinity of 
IPllD for AfM(ir'DD). To est-olisri the sensitivity of 
the affinity Gc^paracion process, we separate sr^a 1 1 
anounts of GP(rPBD) :*ro:n r.ucn larrjer anounts of wtCP. 
In a preferred enbodin^nt. P-^trt II of the process of 
the present invention involves: 
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1) preparing affinity colur.ns bearing AfMdPBD) at 
various densities of A f M ( I ?33) / ( vo lu-e of jiatrix) , 
{Sec. 10. 1) , 
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2) preparing CP(:PSD)i; with various amounts of 
IPBD per CP, 



3) picking a qraoier 
columns (Sec . 10.1), 



roqir.e for e luting tr.e 



4) determining which ccr.bi nation of: a) IPBD/CP, 
b) density of Af y. { I :-'2D} / (volu.-ne of support), c; 
initi.-\l ionic strength, d) elution rate, and e) 
(ar.cunt of CP) / ( vc 1 ur.e of support) loaded, givos 
the best separation cf C?(IPnD) frcr. wtCP {Sec. 
10.1), 

5) determining the sr.allest ar.ount of CP(I?ba) 
that can be isolated fron a ruch larger amount of 
wtCP using tl'.e optir^al ccnJLtion. (Sec. 10.2), and 

6) determining the efficiency of the affinity 
separation procedure (Sec. 10.3). 
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picking a sec of r;ovf:Eral ror.icije? in the PPHO 
CO vary: tl^e principal inriicacors of vMch 
residues Co vary inclu^ic: a) chc 3 0 structure of 
the rPBD, b) sequences of honoLoqcus protein:;, and 
c) conputer or theorecLcai nodcLinq that indicates 
which rer.iduGS can tolerate different anino acids 
without dijruptinq Che underlying structure (3oc. 

I : . : ) . 

4 J plc>:ir,q a subset of the residues picked in Part 
rri.3, CO be varied c ir.ultanccusiy (See. 13.1); 
the principal considerations are the nu.-nber of 
different variants and vhich variants are within 
the detection capjbtlitios of the atcinity 
separation decornincd in Part II. and settin'j the 
rar.c,-e of variation (Sec. 13.2); 

0) ir.plc-cnt ing the variegation by: 

a) synthcsiring the part of the ofiL-X-.i^ fjer.e 
that encodes the resi'Juos to be varied using a 
specific nixturc of r.uclooti.ift substrares for 
some or all of the bases encodiriq residues 
slated for variacicn, thereby creating a 
population of n::A r.olecuies, doroced vgDNA 
(3ec. i: . 3) , 

b) ligating this vjCN'A, by standar:! .x.'-.thods, 
in'co the operative cloning vector (OC) (c^^ 
a plosnid or bacteriophage) (.'3oc. 14.1), 



c) uiing the lig.iccd ONA to transform cells, 
chereby producing a populacion of transforr.ed 
cells (Ccc. 14.2). 
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■j • picJ-'.inq a sec of :;ovf,'ral rcr. i (i'jcr> in the PPHD 
CO vary; the pr indicacors of which 

residues to vary include: n) chc 3D scruccurc of 
Cr.o IPSO, b) sequences oc" homolof;cus protein:;, and 
c) conpucer or theorocical nodcling ihat indlcaces 
vhich rer-idues can tolerate diCfcrenC anino ucids 
wiir.out diiiruptinq the underlying scruccurc (3ec. 
i:. 1) , 

4) picking a subsec of Che residues picked in Part 
lll.Z, to fcc varied s ir.u 1 canccus ly (Sec. 13.1); 
the principal cons ide ri t ions are che number of 
different variancs and vhich variants are wichin 
Che decection capjbilicics of the atcinity 
separation dccornincd in P-^rt LI. ond setcing che 
range of variacion (Soc. 13.2); 

5) ir.p 1 cr.cn c ing che vac legation by: 

a) synches i zing the part of the oFn_-j}>id (jere 
chac encodes the residues co be varied using a 
specific nixcure of nuclcoclde subscraces for 
some or ail of Che bases encoding residues 
slaccd for variation, chcreby creating a 
populacion of n::A -olecules, dorocod vgDNA 
(Gee. 13.3), 

b) ligacing chis v';:;.*:a, by standard rr.'^thods. 
into Che operative cloning vector (OCV) ( o • H - ,, 
a plasnid or bacteriophage) (.'^cc. 14,1), 

c) using the lig.iCcd DNA Co cr-insform cells, 
thereby producing a populacion of transforr.ed 
cells (Sec. 14.2), 
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/vbbroviat ion 



CP 



wtGP 



IPBO 



P3D 



SBD 



PPBD 



03P 



OSP-PBO 



OSTS 



.38 



Mean inc7 



Cenecic Package, e.g. a 
backer LOphage 

Wild-type GP 

Any protein 

T^.e gene for protein X 

Initial Potential Binding 
Dor.ain, e.g. BPTI 

Potential ainding Doriain. o.c 
a derivative of BPTI 

Success f'-:l Binding Domain, 
c .o . a derivative of BPTI 
selected for binding to i 
ta rcct 

Parental Potential Binding 
Do-ain, i.e. an IPDD or an S30 
frcn a previous selection 

Outer Surface Proccin» e.g. 
coat protein of a phage or 
La.nB from col i 

Fusion of an OSP and a roD, 
order of fusion not specified 

Cuter Surface Transport Signal 
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GP(x) 



G?CX) 



CPf osD-pbd ) 
CP(OSP-PBD) 

GTivhd) 
CP(PBD) 
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A r M { w I 



AfM(W) * 



A genetic package containing 
the 2£ gene 

A genetic package that 
displays X on its outer 
surface 

CP containing an oso-obd gene 

A genetic package that 
displays PBD on its outside as 
a fusion to OSP 

G? containing a cbd gene, os_q 
ir.plicit 

A genetic package displaying 
P30 on its outside, OS? 
unspeci f ied 

An affinity r.atrix supporting 
"Q"» e.g. {T-; lysozy-ci is T-; 
lysozyT.e attached to an 
affinity natrix 

A noLcculc having affinity for 
"W, e^sL^ trypsin is an 
AfM(BPTI) 

AfM(W) corrying a label, e..ji_-. 
125i . 



XINDUCE 



A chenicAl that can induce 



4 1 

can be synthesi.icd in 
acceptable yield 

Yield of plds:nia Dtl^ pe: 
volur.e of culture 



Leff 
^eff 



D::A liqatic- efficiency 

Maxinum nunber of 
transformants produced from 
YqioO °' tnscrc 

Efficiency of cnrunacoqrachic 
cnrichnent, enric^ricnr. per 
pass 

Sensitivity cf cr.ror.a tcqrnph ic 
seoaration, can find L m N, 



^chrcn 



1^:1X1^^^ nurJ^pr ot enri.chr.ent 
cycles per varieqaticr. cycle 

Error level in synches i :: ir.cj 
vqDSA 



c«r- n T c-.tnnd.^.rd r>p fTuo nq i r.q ;:i ethod: 

The present invention is not lir^iced to a sinqle 
) method of deternininq the sequence of nucleotides (nts) 
in DNA subsequences. In the preferred er.bod irr.cnt , 
plasmids are isolated and dci:atured in the presence of 
a sequencing prir.er, about 20 nts long, that anneals to 
a region adjacent, on the 5' side, to the region of 
> interest. This plaiinid is then used as the tecplate in 



the four sequencing reactions with one dideoxy 
substrate in each. Sequencing reactions, agarose gel 
electrophoresis, and poLyacry Umide gel electrophoresis 
(PACE) are performed by standard procedures {AUSUS7). 

The present invention is not limited to a single 
method of dGter.-nining protein sequences, and trercrcnce 
in the appended clai.-ns to determining the amino acid 
sequence of a domain is intended to include any 
practical nethcd or confcination of methods, vhcther 
direct or indirect. . The preferred method, in rnost 
cases, is to deternine the sequence of the DSA Chat 
encodes the protein and then to infer the amino acid 
sequence. In sone cases, standard methods of protein- 
sequence deterrziination may be needed to detect pcsc- 
translational processing. 



The. major steps in the process of naming and 
isolating a novel binding protein with affinity for a 
chosen target material are illustrated in Figure 2. 

g^r-. 1; ^ce^:i f ication nf Gcnot-c Pa c V: ag o _a_ncj_Jl-22IL^-g"^ 
DisDlavina h Ho te ro 1 n aou': BinrHng Dn- ..-^ i n On Ics Out fs 
Su rf ace : 

S ec. 1.0: General Ro nn i r p-rnt for Gen et i c P.ickaces 

It is emphasized that the CP on which solection- 
through-bindi.nq will be practiced must be capable, 
alter the selection, either of growth in some suitable 
environment or of Lq v_LLro amplification and recovery 
of the encapsulated genetic message. During at least 
part of the growth, the increase in number must be 
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approxi-naccly exponential -'itn respect to tirr.e. The 
component of a population that exhibits the desired 
binding properties nay be quite small, for example, one 
in 10^ or less. Once this ccriponont oC the, popula t ion 
is separated from the ncn-bindinq components, it nust 
be possible to ar.pUry it, CuUuring viable colls is 
the most pc-erful n.-pi i f ic j t ion of qenecic material 
knovn and is preferred. Genetic messages can also to 
amplified in vitro , but this is not preferred. 

A OP nay typically be a vegetative bacterial cell, 
a bacterial spore or a bacterial DNA virus. A strain 
of any living cell or virus is potertiaUy usetul :f 
the strain can be: 

1) maintained in culture, 

2) affinity separated and retain its- viability, 

3) genetically altered vith reasonable facility, 
and 

4) nanipulated to display the potential binding 
protein coaain where it can interact with the 
target aaterial during affinity separation. 

We believe that it is possible to cause a genetic 
pacV:age to display the IPEO or PBD on its outer surface 
without adversely affecting the viability of the CP or 
the binding characteristics of the IP3D or PBO. 

It is generally believed that the part of the 
polypeptide chain composing one domain folds al.-ost 
independently of the parts composing other doriai.is. 
There are natural proteins composed of two or more 




donains for vhich ' there is strong evidence that 
essentially the sane domain occurs nore than once, for 
example ovomucoids and ovoinh ib i tors (SC0T37) and 
kallikrein (CHUN'aS) . Furthermore, essentially the sane 
5 docain can occur in several different proteins (SUDH35. 
CIL385, and SCOTS?) . 

Rossman (R0SS31} and others have pointed cut that 
the 3D structure of individual do-.ains can be preserved 

10 during protein evolution, even after the anino acid 
sequences have diverged so much that no significant 
hcr.ology can be detected. Hollecker and Creighton 
(KOLL3 3) studied the folding pathways of tvo black 
nanba venom proteins ( called I and K ) that arc 

15 honologous to B?TI. Although the sec^-ences of I and K 
are clearly related to BPTI by tne identity of 19 and 
23 residues respectively, including all six cysteine 
residues, there are 33 and 34 differences. I-ot only 
are the 3D structures of the proteins very si::ilar, but 

20 the pathway cf folding has also been conserved. 

When gene fragments coding :or c-o cc-ams fro;T. 
different proteins have been joined by genetic 
engineering and expressed, the dcr.ains frca the 

25 original proteins sonetiraes fold independently vhile 
tethered to each other (TOTMSS, SMIT35, MA::036) . If 
the insertion is the gene for the entire protein, that 
protein ir.ay be converted into a dona in of the larger 
protein. Fusions of genes that detemine the dor.ains. 

30 hcvever, nust be done at or near donain junctions, or 
do.-aain function .-ay be ir.paired TOTK3 6) . In 

soae cases, the inserted donain will fold, but the 
recipient protein will not; DGcV-vith's fusions of rralj: 
and oh O A genes (BECKS3, M.\K036) gave rise to functional 

35 PhoA domains attached to a frag-cnt cf MalF that 



anchored the chlr-.cric protein Ln r.he lipiU bilayer. 
The MalF procoin uas inccr.plete and could noc function. 



There are two basic nethcds of arranging that the 
ipbd gene is expressed in such a r.annor that the IPBO 
is displayed on the outer surface of the CP. 

First, CNA encoding the IPBO sequence nay be 
operably linked to CtiA encoding all or part of an outer 
surface protein (OSP) native to the CP. If one or nore 
fusions of fragnents of k genes to fragments of a 
natural gs£ gene are known to cause X protein domains 
to appear on the CP surface, x:hen ve pick the DNA 
sequence in which an inbd gene fragnent replaces the x 
gene fragment in due of the successrul tusions as 

a preferred gene to oe tested for the d isp lay-of-IPBD 
phe.notype. (The gene ray be constructed in any 
nanncr.) If no fusion data are available, then we fuse 
an inbd fragment ;o various frogr.ents, such as 
that end at Known or predicted domain 
the or,p gene and obtain CPs that display 
the CP c'Jter surface by 
screening or selection for the d i sp L a y-o f - 1 FlJD 
phenotype. The fusion of iabd and c3d frag.-ncnts may 
also include fragments of random or pseudordndon DUk to 
produce a population, r.er-bers of which may display IPEO 
on the CP surface. The r.c.-ibcrs dispuiyinq IPBO are 
isolated by screening or selection for the disp:.iy-of- 
binding phenotype. 



f r-?-g-ents 
boundaries, 
the or^p- \ nhd fusion on 



While rr.ost bacterial proteins remain in the 
cytoplasm, others are transported to the poripiasmic 
space (which lies between the plasma membrane and the 
cell wall of gr.^r.-ncgativc bacteria), or arc conveyed 
and anchored to the outer surface of the cell. Still 



others are ex-porLoU (Gccr^ccd) inco the r.ediun^ 
surrounding the coll. Thcue charoctcr ist ics of a 
protein that are recorjnizcd by a cell and chat cause it 
to be transported out of the c/toplasn and di:iplayed on 
the cell *=;urface will be tcrr.ed "yuccr-surCacc 
transport .signals" . 

It is believed that the conditions for an outer 
surface transport signal :iro not particulorly 
stringent, i.e., a random polypeptide of appropriate 
length (preferably 30-100 amino acids) h.^s a reasonable 
Chance of providing such a signal. Thus, by 
constructinq a chir.oric gene coriprising a scgmc-nt 
encoding the IFBO linKcd to a scqr-cnt cf random or 
pseudorandom DNA (the potantial 05T5} . .nd placing this 
gene under cc-trol of a suitable prc-ctcr, there is a 
possibility that the chiiiieric protein r.o encoded will 
function as an OSP-IPBD. 

This possibility is greatly enhanced by 
constructing nup.crous such genes. c.ch havinq a 
different potential OGTS , cloning t-o.-n into a suitable 
host, and selecting tor trans fcrnancu bearing the IPTiD 
(or other marKer) on their outer surface. 

The repiicablo genetic entity (phage. or plasnid) 
that carries the o::n-o hd genes (derived from the oyj}^ 
ipbd gene) through the se 1 cct i on -through-b ind ing 
process, see Sec. , is referred to hereinafter as the 
operative cloning vector (OCV) . When the CCV is a 
phage, it nay also serve as the genetic p.icV.agc. The 
choice of a CP is dependent in part on the availability 
of a suitable OCV and suitable r;5P. 
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Prcforaoly, the CP is icodily stored, for example, 
by freczinq. If the CP is a cell, it should h<ivc a 
short doubling time, such as 20-40 minutes. If the CP 
is a virus, it should be prolific, e.g., a burst size 
5 of at least iOO/infectcd cell. CPs which are finicky 
or expensive to culture arc disfavored. The G? should 
be easy to harvest, preferably by cont r i f uqa t ion . The 
CP is preferably stable for a tcpporature range of -70 
to 42°C (stable at ^'^C for several days or -ceki;) : 

10 resistant to shear forces found in HPLC; i:.sonsicivo to 
UV; tolerant of desiccation; and resistant to a pH of 
2.0 to 10.0, surface active agents such as SOS or 
Triton, chaotropes such as 411 urea or 2M guonidiniun 
HCl, common ions such as K^, »a^, and 504", comr.on 

15 organic solvents such as ether and acetone. and 
degradativc enzymes. Finally, there must bo a suitable 
OCV (see .';cc. 3) . 

Although knowledge of -specific OSPs may not be 
20 required for vegetative bacterial cells and ' endosporc-J , 
the user of the present invention, preferably, will 
;;nov: Is the sequence of any eng. k.noun? (preferably 
yes, at least one required for phage). -^IIov does r.ho 
OSP arrive At the surface of CP? (knovledqe of roucr: 
25 necessary, different routes have different ugc:;, no 
route preferred per . G-G.)- 1^ the OSP 

post-translationally processed? (no processing most 
preferred, predictable processing preferred over 
unpredictable process in.g) . What rules are Knovn 
.10 governing this processing, if there is any procosiiinq? 
(no processing no:it preferred. prodictabU; processing 
acceptable). What function docs the OSP sr.-rve in the 
outer surface? (preferably not essential). Is the :0 
structure of an OSP kncvn? (highly preferred). Arc 
fusions between fragr.ents of o^JC and a fraqr.ent of x 
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Known? Does exprcscicn of thOLiO: fusions load to '/. 
appearing on the surface of chc CP? (fusion data is a:; 
preferred as Knovlcci^c of a 30 Licructuro) . Is a "ZO" 
structure of an OSP available? {in this context, a "2C'* 
5 structure indicates which residues are exposed on the 
cell surface). (2D structure less preferred than 30 
structure)". " Whcra are the donain boundaries in the 
OSP? (not as preferred as a 20 structure, but 
acceptable) . Could IPBO <jc throufjn the sane process as 

10 OCP and fold correctly? (IFBD niqhc need prosthetic 
groups) (preferably IPDD ■-fill fold after same 

process) . Is the sequence of an oso promoter known? 
(preferably yes). Is or-n gene controlled by 

rcqulatable promoter available? (preferably yes) . What 

15 activates this promoter? (preferably a diffusible 
chebical, such as IPTC) . How r.any different OSFs do we 
know? (the nore the better) . Hov .nany copies of each 
OSP are present on each package? (.-.ore is better) . 

2 0 The user vill want V:nov l e':l7e of the physical 

attributes of the CP: How large is the CP? (kr.ovledqe 
useful in deciding how to isolor.o CPs) (proferaoly easy 
to separate fron soluble protein? such as IgCs) . Wh^c 
is the charge on the CP? (neutr.-il preferred). What is 

25 the sedimentation rote of the CP? ('rinowlcdqc preferred, 
no particular value preferred) . 

The preferred CP, OCV and ZJ? arc those for which 
the fewest serious obstacles can bo seen, rather than 
30 the one that scores highest on any one criterion. 

t.'ext, we consider general answers to the questions 
posed in this step for the cases of: a) vcgetativoly 
growing bacterial cells (Sec. l.l), b) bacterial spores 
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(Sec. 1.2),. and c) (Sec. 1.3). Preferred OSPs for 
several CPs are given in Table 2. 



UPC. 



t^Arrerial tpUs as renecic P.ickflqes : 



one n.ai- choose any we 1 1 -cha racter ized bacterial 
strain which «>ay be qrown in culture. The important 
questions in this case ,ire: a) do we knew enough abouc 
r.echanis=,s that localize proteins on the outside of the 

10 cell, b) will the IPED fold in the environment of the 
outer menbrane, and c) will cells change expression of 
oso-Dbd . derived fro,n osp-ipbd, durinc affinity 
separation? Sox.e IPBDs may need large or insoluble 
prosthetic groups, such as haor. or an Fe^S, cluster. 

15 that are available within the cell, but not in the 
mediun. The forraation of Fe^S, clusters found in sor.e 
ferrcdoxins is catalyzed by enzyir.es found in the cell 
(BONC35). IPBC3 that require such prosthetic qro-jps 
nay f^il to fold or function if displayr.d on bacterial 

20 cells. 

?;gg. 1.1.1: Pre f oi:rcd_lUic tor ^^Hs ^ts G? : 

The species chosen should have a vcll- 
25 characterized gonc-ic system and strains defective in 
genetic recorT^binat ion should be. available. The chosen 
strain nay need to bo ncinipulatcJ to prevent changes of 
its physioLoqicrl state that would altnr the nu.-ber or 
type of proteins or other r.oleculcs on the cell surrV.ce 
:o during the affinity separation procedure. In vic^ of 
the extensive V.nowlcdqo of coU. ^"v strain o: E.. 

coU, defective in recombination, is the strongo-_;C 
condidato as a bacterial CP. Other preferred 
candidates arc Sii.Lnp_nr_LU Cjs^2ilJJ?iLLLii:n » I?>lcJJ_L!is 
35 su btil is . and r^LCiJ^'i^nr^Dil.'^. aoniqinosa. 
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Induction of synthesis of engineered genes in 
vegetative bacterial cells has been exercised through 
thft use of regulated pror.oters such as lacL'VS , trnP . cr 
tac (MANIo2). The factors that regulate the quantity 
of protein synthesized include: a) pro:?.oter strength 
( Cf . HOOP87), b) rate of initiation of translation frf . 
GOLD87), c) codon usage, d) secondary structure of 
nRMA, "including attenuators ( cf . LAKDS7) and 
terminators f cf . YAGE37] , e) interaction of proteins 
with kR-MA (cT MCPH35, MILI.STb, WINT87), f) degradation 
rates of tnRIJA (ct^ SUQtiaa), g) proteolysis (cT 
G0TTS7) . These factors are sufficiently veil 
understood that a vide variety of heterologous protoir.a 
can now be produced in cjI i cr B^. s ubtl 1 is in at 
least moderate quantities (SKER38, EETTOSl. 



1-1-2: 



Pro'/erro-:! Q -. iter Surfa ce Frot oir.s for 



nisDlavina IPSOs on P.icrcrial Cells: 

Gran-negative bacteria have ou t o r-r.onora ne 
proteins (OK?) , thit forn a subset of CSPs. K.^ny Oy.Ps 
span the membrane one cr r.ore tirr.es. T^.e slqri..^l*J criat 
cause OMPs to local i::c in the outer ner.bt-ane dre 
e'ncoced in the ft-ino acid sequence of the nature 
protein. Fusions of frag-onts or oro gonos vitn 
fragments of an x gone have led- to X appearir.g on the 
outer rr.enbrane {nsirisa-;, CL^:-!Sl). The rules that govern 
the localization of CMP-X fusion proteins are not yet 
fully elucidated. :-:any C'.Fs are polyreric and non- 
essential; a non-essont iai C.'-'P is preferred. A non- 
essential OMP for which there is knowledge ot which 
residues are on the cell surface is more preferred. A 
non-essential OMP for which there is data showing that 
X is displayed ar» part of an OMP-X fusion is r.oct 
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preferred. If no fusion data are available, then we 
fuse an ipbd fragment to various fragnencs of the sso 
gene and obtain CPs that display the oso-ipbd fusion on 
the ceil outer surf.-xce by screening or selection tor , 
the display-of-IPBO phenotype. 

Oliver has reviewed nechanis.T.3 of protein 
secretion in bacteria (OLIV35 and OLIV37) . SiJcaido and 
Vaara (UIKAST) have reviewed mechanis:=s by which 
proteins become localized to the outer r.er.brane of 
Gran-negative bacteria. for example, the L^nS protein 
of coU is synthesized with a typical signal- 

sequence which is subsequently removed. Denscn et al^ 
(3£N.se4) showed that LamO-LacZ fusion prcieins would be 
dcpoc-ited in the outer nembrane of .GoU when 

residues 1-49 of the mature LamB protein nr2 included 
in the fusion, but that rtisidues l-O are insufficient. 
The rules that govern localization of proteins in the 
outer r.embrane cf Crar'.-negat ive bacteria r-er.ain vague. 
Kaiser ot aj^- fKAlS37) showed that the export signal in 
q^rtr^naror.vces rprnvisiae is very broad. cec:.u.-.e when 
they fused random hu.-an DNA sequences to C:.'A coding fsr 
mature invertace. about one fifth cf -he sequences 
resulted in the appe.^ranco of invertase frc- in the 
medium. 

The outer membrane protein LanD of coll is a 
porin for maltose and maltodcxtrin tr.ir.sport, and 
serves as the receptor for adsorption cf bacteriophages 
la-nbda and KIO. This protein har. been purified to 
homogeneity (EUDETS) and shown to function as a trir.cr 
(PALV70). Mutations to phage resistance have been used 
to define the parts of the LanB protf.in that adsorb 
each phage {ROA>:S0, CLEI^ISl, CLEMB3, GF:H?^371. Phage- 
resistance nutations arc dominant (MARC23), suggesting 
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that there is no pr-c f cront ia 1 asser.bly cf wild-type or 
mutant subunita. 

In lamB "^ cells, addition of naltose or 
maltodoxtrin inhibits a torn of motility called cell 
swarming, and la-.B r.utants detective in this proce:3s 
have been characterized (HE:.N38). These rrutations have 
been sequenced and coir.pared to the wiid-type sequence 
(CLEM81) and the concomitant protein domains have been 
analyzed (CLEMB3). Topological models have been 
developed that describe the function of phage receptor 
and moltodextrin transport. The models describe these 
donains and their locations with respect to the 
surfaces of the outer ncmbrane (CHARa*;, HEINSS). 

LamB is transported to the outer mecib-rane if a 
functional N-terninal • soqucnce is present; further, the 
firr.t 49 anino acids of tne mature sequence ore 
reauired for successful transport (BENS34). Hcn^.ology 
tsetween parts of LomH protein and other c-.itcr rier.brane 
proteins OmpC, OapK and FhcE has been detected 
(IU.L«\84); including homology between Lame amino acids 
39-43 and sequences of the ether proteins. The-je 
subsequences nay label the proteins for transport to 
the outer noip.brar.e. Further, monoclonal antibodies 
derived from nice ir-.munized with purified LamU, have 
been used to characterize four distinct topological and 
functional regions, two of which a re . ccnce mod with 
maltose transport (CADAU2). 

Genoral knowledge on processing of signal 
sequences in E^ col i is relevant to the present 
invention both for use of Z._ col 1 fier so and for use in 
conjunction with filamentous phage (vide jjllo) . 
Genetic experiments on proccs?inq of signal ?cq>'-.nces 
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indicate that it the S21-F22-A23 sequence is preserved, 
signal peptidase (SP-I) «iU cleave after A23 (OUV37). 
Many oxaoples hav, been cited in which the OUA coding 
foe the leader or signal sequence fron one protein has 
be»n attached to the OHA sequence coding tor another 
protein, protein X (DECK93, INOL'86 ChlO, LEEC86, 
KARK86, and bOQl.'37). Expression of such i chir.eric 
gene often causes protein X ' to appear free in the 
periplasm. That is, the leader causes the ne'^ protein 
to bo secreted through the lipid biloyer: once in the 
periplasn, it is cleaved off by SP-I. 

Eeckwith (Bi:CK3 3 and lu\!;08 6) has shovn that when 
the aiiai gone is inserted in frans into the coding 
sequence for an integral nenbrane protein, for exa-.ple 
MalFi that the PhoA domain is local i.'.od according to 
where in the integral nembrane protein the ghoA geno 
w,s inserted. That is, if eh2A is inserted after an 
anino acid which normally is found in th^ cytoplasa, 
then PhoA appears in the cytoplasr.. If 9hnJ^. is 
inserted after an amino acid normal l-/ found in the 
periplasm,- however, then the PhoA domain is localized 
pn the periplasnic side of the membrane, and anchored 
in it. 

Bockwith and colleagues (UECKOa) have extended 
these obser.-ations to the lacZ gone that can to 
inserted into gen»3 for integral membrane proteins such 
that the LACZ domain appears in either the cytoplasm or 
the periplasm according to where the LdsI gene -as 
incerced. 
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OSP-rrOD fusion proteins oeed not fill a 
structural role in the outer nembranes of Cram-negative 
bacteria because parts of the outer -ccbranes are not 
hiqhly ordered. For large OSPs there is U^:ely to be 
5 one or aore sites at vhich osp can bo truncated and 
fused to iob d such that cells exprcssinq tho fusion 
vili display ITEDs cn the ceil surCacc. If fusions 
bct-een fragr:.ent2 of ctq and x have jsen sho-n to 
display X on the cell surface, '-e can design an osp- 

10 jpbd gene by substituting inbd for x in the DMA 
seq^jence. Other«fisc. successful O.*^?-:?30 fusion is 
preferably sought by fusing fragments o: the best cinfi 
to an j pbd . expressing the fused geno, and testing the 
resultant CPs for d i:::?lay-of-r PBO ph-snotype. w:e use 

15 the available data abcut OMP to picK the point or 
points of fusion fcef-een ono and ir"-1 to ^laxinize the 
likelihood that I?3D will be displayed. Alnernativeiy, 
ve truncate c^'zry at several sites cr in a manner that 
produces 030 frarpr.-nnts of variable Icr^j-h and jse the 

20 cT.o fragments to icbd ; cells expressing zha fusion are 
screened or selected vhich display :?3Ds on the cell 
surface.. An additional alternative is to include short 
segments, of randc--. D::a in the fusion of c-:o fragments 
to jpbd and then screen or select the resulting 

25 variegated populitinn fcr members exhibiting the 
dispiay-of-IPQO pr-erotypc. 

The promoter for the o?p-inbd gene, preferably, is 
subject to regulation by a small chemical inducer, such 
3C as iscpropyl th icga lactos ide ;rPTC) (lac promoter). ft 
need net come from a natural oso gone; any rogulatable 
bacterial promoter can be used. 



Cnce a genetic packaging system employing 
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Gpeci t'Lcal ly sticky so that C?'s c i 3p U^y ing incctnplece 
PDDs are easily roniovcd from Che populdtion. 

The random DMA can be qoneri"\ced frora any OUA 
having high sequence diversity by partially digesting 
with an enzyr.e tnat cuts very oftci. . S au:A I, for 
example, generates cohcs ivc-cndod DMA that can be 
cloned into a CnrH I or Hal n site. Al ternat i vcj y , 
one could chcar DNA having hic,'n sequence diversity, 
blunr the sheared DN'A with the large tragnent of 
col i Of'A polymerase 1 (hereinarter referred to as 
Klencw fragment), and clone the sheared and blunted DIU 
into blunt sites of the vector {yj^iilSZ. p295, AUSU37: 
5.1.1) . 

Sec. I.T.: Dispiayinq IPrD oi". b^.g r criil r,oorcs: 

Bacterial spores have desirable properties as CP 
candidates- n a c : 1 1 u s spores neither actively 

metabolize nor alter the proteins on their surface. 
However, spores are much rr.ore resistant than vegetative 
bacterial cells or phage to chemical and physical 
agents. Spores have the disadvantage that the 
molecular mechanisms tnat trig^jcr sporulation are less 
.well worked out than is the forr.ation of Mi: or the 
export of protein to the outer r.enbrane of col i . 

Sec. 1.2.1.: Preferred Bricterlal Sgoro.r: for Use a s_C?jj. 

Bacteria of the genus B n^- ; Ilus fom endospores 
that are extremely resistant to damage by heat, 
radiation, desiccation, and toxic chemicals (reviewed 
by LcGick et ^zl^ (LCSIG6)). These spores have cor.plex 
structure: and norphogenes is that is spec ifts-spec i f ic 
and only partially elucidated. The following 
observations are relevant to the use of 3ac i 1 1 us spores 
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ac gcnecvc pacJia^RS for the purposes of the preserve 
invention . 

Plasraid PI-'A is corr^rionly included in spores. 
Plasmid encoded proteins have been observed on the 
surface of O^ ci 1 I-.js spores (DECROO) . Sporulaticn 
involves cor;plex temporal regulation that is now 
moderately well understood (LOSI36) . special siq.tia 
factors, such as sigma^, are produced during 
sporulation. Rl.'A polymerase bound to a sporulation 
signa factor recognizes promoters that are not 
recognized by RUA polymerase bound to a vegetative 
signa factor. The secuence-s of several sporulation 
promoters are knovn; coding sequence*^ operativcly 
linked to such promoters are expressed only during 
sporulation. Pay et a 1 . (RAYC87) have shown that the 
C4 promoter of subt i 1 is is directly controlled by 

Rlih polymerase bound to sigma^, 

Donovan et a 1 . have identified several polypeptide 
components of fL^ subt II is spore coat {DON087); the 
sequences of tvo complete coat proteins and ami.nc- 
teminal fragments of two others have been determined. 
Some compo.nents of the spore are synthesized in the 
forespore, e.g. srall acid-solublo spore proteins 
(ERRI38) , while other components are synthesized in the 
mother cell tind appear in the spore f e . g . the coat 
proteins). This spatial organization of synthesis is 
controlled at the transcriptional level. 

Spores se 1 f -assemble , but the signals that cause 
voriotjs proteins to localize in different parts of the 
spoL-e are not well understood; presumably, the signals 
controlling deposition of the coat proteins from the 
cytoplasm of the mother cell onto the spore coat are 



embedded in the po lyijop:: i :to sequence. Some, but not 
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all, o^ the coat protcir.-.; ^co s/ntheolzed as precursors { 



and are then proccsr.ri.i ay specific proteases be/Tore 

deposition in the spore coat (COM037). . Viable spores »: 

that differ only sliqrtly fro:n wild- type arc prodiiced I'. 

in 5^ sL-bt i I I g even if any one of four coat proteins is t 

.T.LGsinrj (OC:i0.77} , ni-uirice bonds- forn wichiii t>.e \ 

spore (thiol rcciucina a-jcntii are r.'jeded to r-olubiiize f, ' 

several of the proteins of the coat). The I2kd ccat ij- 

protein, CotD, ccntair.s 5 cysteines. CctD also ^ 

contains an unusually hiqh nu-ber of histidines (16) ^ 

and prolines (7). The lir'.d coat protein, CotC, J.- 

contains only o.ne cysteine and one methionine. CotC \; 
has a very unusual anino-Acid sequence with 19 lysines 

(K) appearing as 9 (■■-;< dipc^ptides and one isolated K, |. 

Th*;re are also 2 0 tyrosines (V) of which 10 appear as b js 

Y-V dipepcidcs. Peptides rich in / aric! K ar-^ krown to ^. 

becr,.-e cross! in/:ed in o>ci'Jiv:inq env irbnir.ents {OE'v/07 3, Tr 

WAIT32. v;ArT35, VAIToCj. ':otC contair:3 16 D and £ y 

anino acids that nearly cruals the I'J Ks. There jr-j no L 

A, f', R, I, L, tl , P, 0, S, or W ar.mo at* ids ;n CctC. r 

N'either CutC.nor CotD is psiit-trans id t lonal ly cleaved. J 

The proteins Cot A and Coi3 are pes t-tran::; 1 at ionai ly ^ 

cleaved. t 



• Endospores fron ^t:f: qonus 3ac i 1 ! u r> are rr.ore stable 
than are axO£;porcs f rcr. ? r.r'^ntcnvces . Ha c i II 'js 
s<:h t i 1 1 3 forms spores in <; to 6 hours, but tro pto r.vces 
species .nay require d-i/s or woek'j to sporulatc. In 
additio.n, gonu-tic >:no*./lod':e and ran ipula t ion is nuch v. 
rcrc developed .••'or 3^ r'l r.-. i 1 i s than for cth'^r spore- J" 
torr.inq bacteria. Thus .fVvrjJJjrs spores are preferred 1" 
over S*".ronronvc^s spores. Bacteria of the qenus [ 
CI or: t rid iun also forp. very durable endosperms, but V 



Clostridia, being strict anaerobes, are not convenient 



o 



5y 

to culture. The choice oC a species of B.ici I Lus is 
governed by Kncvledge and availobiiity of clcnintj 
systems and b/ how easily sporulacion can be 
controlled. A particular strain is chosen by the 
5 criteria listed in Sec. 1.0. Spores ar** exposed to an 
oxidative cnv i ron.T.ent after release frora the r.other 
cell, so that d i.iul f ides . . i f any, w i r.h In* the I I'UD niqht 
form. ^^any vf,<-;oc a t ive bioc.^tor. ic.i 1 pothways are shut 
down when sporulation Levins so that prosthetic groups 
10 might not be available. 

Sec. 1.2.2 Prefer red outer-sur .^.tcq proteins for 

DisolavinQ TPao on Bacteri.il Sooros: 

15 If a spore is chosen as CP, the pronotor is the 

most . .important part of the org gene, because the 
. pro.-nottr cf a spore coat protein is most active: a) 
when spore coat protein is being synthesized and 
deposited c.-ito t'r.r^ spore and b) in the specific place 

20 that spore coat proteins are being r.ade . in 3.., 
sijbt i lis, ybr.e or the spore coat proteins arc post- 
transiational ly processed by spocitic proteases. It is 
valuable to knov the sequences of precursors and nature 
coat proteins so that ve can avoid incorporating the 
25 -recognition sequence of the specific protease into our 
construction of an CSP-ir^D fusion. The sequ-nce of a 
mature spore c:^.\t protein contains infornation that 
causes the protein to be deposited in the spore coat; 
thus rjene fusions that include sor.o or all of a -ature 
30 coat protein sequence arc preferred for screening or 
selection for the display-cf-IPBD phcnj/type. 

fusions of 1 rb"! fragncnts to cotC or cotD 
fragaents are likely to cause IPBD to appear on the 
35 spore surface. Trie genes cctC and cotD are preferred 
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osp genes because CotC ^and CotD are not post- 
trans la tiona 1 iy cleaved. Subsequences frcn cor. A or 
cotD could also be used to cause an IP30 to appear on 
the surface of subt i I is sporos, but we must take the 
post-translaticnai cl^savaqe o:* those proteins into 
account. O.TA encodin-g I PUD could oe fused to a 
fragment of cot A or cotn at cither end of the ccoinq 
region or at sites interior tc tne coding region. 
Spores could then be screened cr selected for the 
display-of-I?nD phenotype. 

To date, no laci l\us sporuljtion pronoter has been 
shown' to be inducible by an cvo^encus chemical inducer 
as the lac promoter of £.=_La • N'evcrthel ess , the 

quantity of protein produced fron a sporulaticn 
promoter can be controlled by other factors, such as 
the DNA sequence around the £h ;ne-Oa Igarno sequence cr 
codon usage. Chenically inducible scorulaticn 
promoters can be developed if r.ececs.-i r/ . 

Soc. 1.^.3: Choice of lr,z'^rz ion gito tor I??D in OS^ 
of D^ctori.^1 Score: 

The considerations governing insertion site in tne 
spore OSP are the same as those given in Section 1,1.3. 

Sec . 1.2.4: :n V :\'o S^l ect i -"^ n f or ? s eado*o so ^-c-nes 

From Random r^N'A Innerts in ^Acznr i Srores: 

Although the cons ide ra t I-'^ns for r>pores are n^'e-rly 
identical to the considerations for vegetative 
bacterial cells (Sec. 1.1), the available infor-aticn 
on the mechanisms chat ca-isa proteins to appc.ir cn 
sporon is meaf7er so that use of the random-DNA approach 
becomes a more attractive option. 
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We can use the apnroac^ described .'ibove at 1.1.4 
for attaching an IPCD to an cp_LL cell, except that: 
a) a sporuLatior procotcr is used, and .b) no 
periplastic signal i;cqiton-c cr.ould be present. 

1.:: ni::nla vi nn TPL-r: en C-jtcr .Sur^aro of V^.m- csj. 

Sec. 1.3.1: Prefe rred Pnormo for U?-e CFn: 

Unlike bacterial cells and spores, choice of a 
phage depends strong!/ on Xncvledge of the 3D structure 
of ar. CSP ar.d he- it ir.-cracts '-ith other proteins in 
the capsid. The site of the phage genor.e and the 
pac/:aging mechanism are also important because the 
phage gcno.-ne itself is the cloning vector. The gsj^z 
igh^ gone must be inserted into the phage gc 
ti" ere tore : 



20 1) the virion must be cjp.ible of accepting the 

insertion or suhstituticn ot gcncti: material, and 

2) the genome ot the phage r-.ust be small enough to 
allow convenient nan I pi: la t i-^n . 
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Aaditional considerations m chooi-.ing phage are: 

1) the morphogenet ic pathway of the phage 
determines the cnv i rr-r.-ent in vhich the IP30 --ill 
have opportunity to fold, 

2) IPUOs containing essential d i su 1 f it:c-j may net 
Told with in a eel I , 
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3) IPDDs needing l^rqe or insoluble prosthetic 
groupa nay not fold if secreted because the 
prosthetic c,'roup is lacking, and 

when variegation is introduced in Part III, 
-ultiple infection-^ cculd generate hybrid CPs ;hjt 
c:irry the cer.e Ccr cr.e PiiD buc have at least ao.-c 
copies of a different P3D on their surfc\ces; it is 
preferable to rninir.ize this possibility. 

Bacteriophages are excellent, candidates for CPs 
because there is little or no enzymatic activity 
associated with intact -.iture phrige, and because the 
genes arc inactive o^tsirie a bacterial host, rendering 
the .-mature phage particles r.fitabcl ica lly inert. The 
fiia-entous phace >:J 3 ^r.d bacteriophage HhiXl'M are of 
particular interest.- 



The entire life cycle cf the f i 1 a.-nentcus phage 
M13, a conr.on clcning ann sequencing vc/tor. is well 
understood. M13 and fl arc so closely related ti-.at wc 
consider the prcportics of each rclev,-M;t to both 
{PJkSC36) ; any differentiation is fcr histtrrical 
accuracy. The genetic structure (the conpieUe sequence 
(SCH.^To) , the iccntity an.-i function of the ten genes, 
and t.-.c order of t rar.-jci-ipt ion and loc^^tion of the 
prc-ocers) cf. .".13 is vcLL V:tAO'-'n os is the physical 
structure of t^.e virio.-. (HAN::si, DOLKSO, 
:TCK79, KAPL7o. rJK::o5b, KUIir:87, .'"IAKOSO. J</vRV7S, 
KESS73, CHKA31, ?.\f^CZ6, .-■JS.'^oU SCHA73, SMIT35. V£3S73, 
and r.:MM3 2) ; see PASCGft tor a recent reviov cf the 
structure and function of tr.o coat proteins. 
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rilomentcus chace cntar £^ col i thi-ouah the sex 
pllus colls bearing the F-factor. Achtr.an et al_^ 
(ACHT7S) observed that the pilus is extraordinarily 
sensitive to SOS; 0.03% SDS inhibits bir.ding of .MS2 to 
pilin in vitro . Infection nay therefore be inhibited 
by SCS. 

The 50 a-.ino -icid nature coat protein is 
synthesized as a 73 anino acid precoat (ITOK79). The 
first 23 amino acids constitute a typical signal- 
sequence which causes the niscent polypeptide to be 
inserted into the inner cell nerr.brane. 

An coli signal peptidase (5?-!) recognizes 

amino acids IS, 21, and 23, and, to a lesser extent, 
residue 22, -^nd cuts betveen residues 23 and 2A of the 
precoat (Kt-THNSSa, KUHN'SSb, 0LIVS7) . (See also sec. 
l.i.'-2 for general kr.ov ledge on secretion in'E.-. col i . ) 
After reiTioval c? the signal sequence, the 3-ino 
terminus of- the nature coat is located on tho 
periplasnic side of the inner nerbran-^; the carbo>:y 
terminus is on the cytoclas-ic side. About 3O0O copies 
of the mature =-3 ar.ino acid coat protein asscu-iate 
s-ide-by-s ide in tl-.e inner -er.brone. 

The gene vi, VI and IX proteins v-\re also present 
at the ends of the virion in abour. five copies e.ich. 
The single-stran-.-tod circular phage D:.'A associates vLth 
about five copies of the gcno III protein and is then 
extruded thro'jgh the patch of me-brare-assbcif^tcd ccat 
protein in such a vay that the DN'A is encased in a 
helical sheath cf protein {Wr:357S). Th** DWA dees not 
base pair (that vould ir.pose severe restrictions on the 
virus gonone) ; rather the bases intercalate with each 
other independent of sequence. Because the M13 geno;r.c 
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is extnided through r.he r?.e=\i3rane and coated by a large ^, 

number of ideiitic.il protein molecules, it can be used ^; 

as a cloning vector {v;atS37 p273, and MESS??). Thus we i^. 

can insert extra genes into H13 and they viH be ^■ 

carried along in a stable nanner. f;- 

■ t 

Marvin and collaborators (MARV7 3, tVvKOSO, BANN?l) J. 

have dctcr:nined an aoproxinatc 3D virion structure of 

fl by a combination of genetics, biochemistry, and X- ^ 

ray diffraction fron fibers of the virus. Figure 3 is 

drawn alter the -;oiel of Banner eC. a I . (BANNai) and 

shows only the C^j^p[,jS of the protein. The apparent 

holes in the cylLnUrical sheath are actually filled by \ 

prot2\n side groups so that the DNA wichin is [ 

protacted. The anino terminus of each protein r.or.o-.cr ^ 

is to the outside cf the cylinder, while the carboxy ^ 

terrsinus is at sr.ailer radius, near the DriA. Although I 

othor filamentous phages ( e.g. PCI or lk(-'. have 'j. 

i 

different helical cynnetry, all have coats cor.poscd o: y 

many short - a Ipl'.a -ho I ica 1 raononers '-Lth the ar.ino [ 



terrainus oC each r:ono-er on the virion surface. 
Dactor ioDhaoG Ph:>:i?-: : 



The bacteriophage ?hiX'174 is a very jr.all 
icosahedral virus which has been thoroughly sf^diea by 
genetics, biccr'.eni;5t ry , and electron r^icrosccpy {Sec 
ThG S inqlc-Str^jndfid DNA ^^■:^c:es (CE::;H7S)). To date, no 
proteins from PhiX174 have been studied by X-roy 
diffraction. PniX17-t iz not used as d clonir^g vector 
because r'hiX174 can accept al.-noEt no additional DNA; 
the virus is so tightly constrained that several of its 
genes overlap. Char.berc ot aK (CHAJ^!5:) sho-cd that P, 
mutants in gene C are rescued by the wild-type G gene 
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carriod on a plasnid so that the host suppiies tliis 
protein . 



La ran dna Phages 



9 



Three gene products of PniXi74 are present on the 
5 outside of the mature virion: F (capsid) , C (riajor 

spike protein, 60 copies per virion) , and H '(minor ^' 
spike protein, 12 copies per virion). The G protein F..- 
cor-.priscs 175 amino acids, while H co:.:prisos 328 anino f:. 
acids. The F protein interacts uith the single- f;^ 
10 stranded DftA of the virus. The proteins F, G, and H 
are translated fror:. a sinrjle mRr.'A in the viral infected 
cells. 
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Phocje such as lantda or T4 have much larger \ 
gencnes than do M13 or PhiXL74. Large qenones are less 
conveniently manipulated than saall genomes. A phage p-Ji 
witJ. a large genome, however, could be used if genetic 
20 r.-.anipulation is sufficiently convenient. Phage such as 

lar.bda and T4 have .T.ore cor.plicated 3D capsid | 
structures than H12 or PhiX174, with more OSPs to 
choose from. Phage lanbdn virions and phage T-; virions \, 
- fona intracellularly , so that IPDDs requiring large or y\ 
25 insoluble prosthetic groups might fold on the surfaces \ 
of these phage. T 

I 

PN'A Ph ages 

i 

30 RNA phage, such as Qbeta, are not preferred f 

because manipulation of R::a is much less convenient 
than is the manipulation of UNA. Although ccr.potent . - 
R/JA bacteriophage are not preferred, useful genetically 

altered fUlA-concain ing particles could be derived fron ••J 
35 RTJA phage, such .ts H32. P* ^ 



MS2 is a typical sr.all Rl'A phage -hau carries only 
three genes that are tightly regulated through RNA 
structure and protein-FUfA interactions. The TU'n fills 
the protein capsid so that no additional genes can be 
acccninodated. To use hS2 as a CP. we would need to 
eliminate most of the natur-^l viral genome so' that an 
oso- intid gene ccuid fit into tnc protein capsid. It is- 
known that the A protein binds sequence-speci f ically to 
a site at the 5' end of the + Rl.'A strand triggering 
formation of JUlA-conta in ing particles if coat protein 
is present. If a message containing the A protein 
binding site and the gene for a chir.era .of coat prot^^iin 
and a P3D were produced in a cell that al30 contained A 
protein and wi Id-type ' coat crotGln (both produced from 
regulated genes on a plascid) , then- the RUA coding for 
the chimeric protein would get packaged. The viral RNA 
replicasc gene is not needed because all .components 
needed for forr-ation of p-irticlcs arc encoded in CtlA. A 
package comprising PMA encapsulated by proteins encoded 
by that Rr:A satisfies the major criterion that the 
genetic message inside th2 package specifies something 
on the outside. The particles by themselves are net 
viable. After isolating the packages that carry an 
SBD, we would n-;ed to: 

1. separate the R:'A from the protein capsid, 

2) reverse transcrite the RNA into DfJA, using AMV 
or I<I1TV reverse transcriptase, and 

3) use Thomns n nu^ t icj s Dltk polymerase for 2 5 or 
more cycles of Pol'—crasc Ch^in React ion C to 
amplify the DUA until there is enough to subclone 
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the recovered Qe-ctic -cssage i.nco a plasmid for 
sequencing and further *-'ork. 



Alternatively, helper phage could be used to rescue the 
5 isolated phage. In one of those vays ue can recover a 
sec'-:ence that codes for an SEJD having desirable binding 
properties. The in vitro anp 1 i f ica t ion (SAIK85, 

SCHA36. US Patents ^;,6S:,202 and 4.683,105) may bo 
ccnvenientiy carried out using a PerJcin-Elmer/Cetus 

LO Therrial Cycler (part nur.ber N301-O15O) and CcneAnp DNA 
Arpli f ication Reagent Kit (No01-00';3) supplied by 
Perkin-Elrr.er Corp., 761 Main Avenue, Nor-'alk, CT, 
06859-0012, USA. The prir^rs used in the Poly:aer?.so 
Chain Keaction f^-'J should bo picKed so that the osD-'^ ''bd 

15 gene is the part of the reverse-transforned Oti\ that is 
ar.pi i f ied . 

Although such a procedure is r,uch nore cunbercome 
than use cf D.'.'A phage, it r.ay ce of interest if:-l) the 

20 genetic cac/:aqe of tr.e r.:-'A phage is r.uch more stable 
than any OltA phage, 2) the 3D structure of an .Ri.'A phage 
is >:novn (f2 ferns crystals inside col i . suggesting 
that structure dcterr.ira tion of f2 virion may be 
practical), or 3) folding of a larga protein in.side a 

25 cell is desired (thir schcr.e aliovs alnoiit the entire 
3.5 Kb gonorr.e of ^:G2 to be used for ch irr.eric coat 
prctein-P2D) . Use cC furiions involving M52 coat 
protein, together wich uild-type MS2 coat protein, to 
encapsulate genes dcr-o.-ist rates the most prinitive 
0 systen that could be cnployod in ttie presr^nt invention. 
Although the systen has certain technical 
inconveniences and there: jre is not preferred, it could 
be used. 
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Sec . 1.3.2: Praferrq'J C-jtor-.^-j r f-'ico Proccins fo r 

DisDlavina IPSDs on Phaaos: 

For a given bacteriophage, the preferred OSP is 
5 usually one that is present on the phage surface in the 
largest number of copies, as this allows the greatest 
flexibility in v.Trying the ratio o: OSP- 1 PSD to wild 
•lype OSP and also gives the hi'jhest iiy.elihood of 
obtaining satisfactory affinity separation. Moreover, 
10 a protein present in only one or a few copies usually 
pertorns an essential fur.ction in r.orphogenes is or 
infection; mutating such a protein by addition or 
insertion is likely to result in reduction in viability 
of the CP. 

15 

It is preferred that the wild-type osp gene be 
preserved. The iobd gene fragment may be inserted 
either into a second copy of t.he recipient osp gene or 
into a novel engineered c sd gone. It is preferred that 

::0 the oso- Lpbd gene be placed und3r control _of a 
regulated prcr.oter. Cur process forces the evolution 
of the PBDs derived fro.-n IPDD so that sone of then 
develop a novel function, -/ i 7. . binding to a chosen 
target. Placing the gone that is subject to evolution 

25 on a duplicate gene is dn i.-nitation of the videly- 
accepted sce.nario for the evolution of protein 
■ families. It is nov generally accepted tliat qcne 
duplication is the first step in the evolution of one 
protein f ro.r. an ancestral protein. By having tvo 

30 copies of a gene, the affected physiological process 
can tolerate mutations in one of the genes. This 
process is well understood and documented for the 
globin family (cf_^ DICK83, p65ff, and CREr34. pll7- 
125) . 



preferred OSP for use vhen tne G? is M13 is 
III protein (see Zxarr.ple 1). 

Sec. 1.3.3: Choice of Insfirtion site for IPBD in OSP: 

The ucer nust choose a site in the candidate G.SP 
gene for inserting a iobd cene fragnvent. The coats of 
most bacteriophage arc highly ordered. F i 1 anoticoi:s 
phage can be described by a helical lattice; isor.etric 
phage, by an icosahcdral lattice. Each monomer of each 
major coat protein sits on a lattice point and makes 
defined interactions with each of its neighbors. 
Proteins that fit into the lattice by making some, but 
not all, of the normal lattice contacts are likely to 
destabilize the virion by: a) aborting fcrnation of t^.e 
virion, b) making the virion unstable, or c) leaving 
gaps in the virion so that the nucleic acid is not 
protected. Thus in bacteriophage, unlike the cases of 
bacteria and spores, it is irportanf to rot-^in nost or 
all of the residues of the parental OS? in engineered 
OSP-IPDD fusion proteins. 

Association of proteins into diners, tri-:crs, or 
even larger structures represents yet another Aspect of 
protein binding. For proteins that fern such 

associations, heterologous ai>:tures of mutant and 
normal proteins will for.i if the nutations have not 
altered the interface bctvcon subunits. For example, 
Ward e^ al_^ have shown that tyrosyl tRNA synthetase 
will form hoterodinors when mutant and nomal protein 
are allowed to refold togetnor (WARD86) . Sec also 
Hicknan and Levy (lUCKoS) who studied the raultir.cric 
structures of the Tet^ protein by engineering cells to 
carry two different ret alleles and observing a Tet^ 
phenotype arising from the complementary alleles. They 
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conclude thit Che Tec^ proCcin is r.ultlnGric. 

Ininunogiobul in forr.ation depends on the ability of V^, J 
dona ins and dornains, each a part of a separately J, 

synthesized protein, to associate independently of the I 
protein sequence in the antigen complementarity- 
determining regLons. In dddition, the process of ^ 
imnune coir.p I e.Tcnto t i on depends on the separability of f. 
the bindio'j prcpcrties of the complementarity- 
dctemining regions froa th.e binding properties of the 
constant domains. 



Auditore-Kargreaves, U5 Patents 4,470,925 
(AUDrS4a) and 4,479,395 (AnDI34b) teaches methods of 
making hybrid antibodies t'lat dopend on- ar^sociation of 
different antibody ch.:ins. These patents teach that 
alterations far f:*on the interinolecular interface do 
not alter the association. 

A preferred site for insertion of the i nbd gene 

into the pho-^e cso gene is one in vhich: a) the IP3D 

folds into its original shape, b) the OSP tior.ains told tj 

into their 'original shapes, and c) there is no 

interference cctveen the tvo dor.ains. It is net [-*•, 

required that the 1?ZD and OSP donains have any I . 

particular spatial relationship; hence the process ot 

this invention docs not require uso of the method of U5 |"- 

Patent '692. f.-. 

I ■ 

If there is a 30 nodel of the phage that indicates ^ 

that either tho ar.ino or carboxy tominus of an OSP is ^ - 

exposed to solvent, then the exposed terminus of that ^ 
nature OGP Locor.cs the prir.e candidate for insertion of 

the icbd gene. A low resolution 30 -cdel suffices. L . 
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In the absence of :i 3D structure, cho «=inino and 
carboxy termini of the nature C5? are the best 
candidates, for insertion of the i obd gene. A 
functional fu.-ion may require additional residues 
5 between the IP3D and 03P dona ins to avoid unwanted 
interactions between the doriains. Randon-soquence DMA 
or DtiA coding for a specific soq^jcnce of a protein 
homologous to the IPDO or CSP, can be inserted between 
the osp fragnent and the i rbd fraqr.ent if needed. 

10 

Fusion at a domain boundary within the OSP is also 
a good approach for obtainii g a functional fusion. 
Smith exploited such a boundary when subcloning 
heterologous Dth\ into gone, ril of fl (SMIT35). 

15 

There are several ir.othods of identifying domains. 
Methods that rely on a.cmic coordinates have br.cn 
reviewed by Janin and Chothia (JA);i35) . These methods 
use matrices of dista.-.vres between alpha carbons 

20 (^aloha) * dividing planes (c.f . .R0SES5) , or burled 
surface (PJkiJtiS-;) . Chechia and collaborators have 
correlated the behavior cf r.any natural proteins with 
domain structure (according to their definition). 
P.ashin correctly predicted the ::tacility of a domain 

25 ^comprising residues 20G-316 of therr.ciysin (VITAS'*., 
RASHS4 ) . 

Many researchers have used p/irtial proteolysis and 
protein sequence analysis to isolate and identify 

3 0 stable domains. (Sec, for example, VITAS-:, P0TE31, 
SCOTS 7, and PAB07 9.) Pa bo e^ a_l_-. used calorimetry as 
an indicator that the cl repressor from the coliphago 
lambda contains two dcmains; thoy then used partial 
proteolysis to determine the Iccotion of the domain 

35 boundary. 



It i:; generally believed t.^at the p^rt of the 
polypeptide c-'.ain compoGing one riozain folds almost 
independently of the parts conposing other donains. 
5 There are natural proteins co.-posed- of t-fo or raore 
doma:i\s for vhich there is strong evidence that 
essentially the sa:ze domain occurs r.ore thon once, tor 
example ovo-'jcoids and ovo Inh ib I tors (CCOia?) and 
kallikrein (CML'USG) further, tnc same dcnain can 

10 occur in several different proteins (SUDHOS, CILBBS, 
and SCOtaV) . 

If the only structural in f orr.at ion available is 
the anino acid sequence of the candidate 05P, we can 
15 use the sequence to predict turns and loops. There is 
a high probability that son-e of the loops And turns 
will be correctly predicted fcf . Chou and fasnian, 
(CHOU72)); these locations are also- candidates for 
insertion of the i ob d ejene fragcent. • 

:o 

Sec . I .3 . A : -In Vivo Sfrloc-ticn for Pscuco-or.P C-'ne frcn 
R and on DNA .'r . seris in "actor S nr. res: * 

Alternatively, a functional insertion cito -^y be 
25 determined by generating a nur.fcor of roccr.binar - 
constructions and selecting the functional strain by 
phenotypic characteristics. Bcciuse the CSP-ir-DD must 
fulfill a structural role in the phage coat, it is 
unlikely that any particular ran.'joa DI'*A sequence 
30 coupled to the ipbd gone --ill prod.jco a fusion pro:.ein 
that fits into the coat in a functicral wny. 
nevertheless, randon UNA inserted between large 
fragnents of a coat protein gene and the i phrl gene will 
produce a population that is likely to contain one or 
35 more members that display the IPBD on the outside of a 
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viable phage. A display probe, sinilar to that defined 
in 1.1.4, is constructed and random DNA sequences 
cloned into appropriate sites. 



Sec. 2 : Choice of Y PHD : 

A IP3D may be chosen from naturally occurring 
proteins or donains of natural'y occurring proteins, or 
aay be designed from first prinoiple.5. A designed 
protein aay have advantages over natural proteins if: 
a) the designed protein is nore sr.able, b) the designed 
protein is smaller, and c) the ztzcqe d i:3tr ibut ion of 
the designed protein can be specified more freely. 

A candidate IPBD must meet the following criteria: 



1) a donrain exists that will reniain stable under 
the conditions of its intended \:se (the domain may 
con:prise the entire protein that will be inserted, 
e.c. BFTI) , 

2) knoviedqe of the amino acid sequence is 
obtainable, 

3) knowledge of the identity of the residues on 
the domain's outer surface, and their spatial 
relationships, is obtainable, and 

A) a n:olccule is available having specific and 
high affinity for the IPSO, Af.^.CIPOD). 

Preferably, the IPDO is no larger than necessary 
because it is easier to arrange restriction sites ir. 




\1\ 

K , 



74 

snaller aaino-acid • sequences and because a smaller 
protein minir.izes the metabolic strain on the C? or the 
host of the CP. The usefulness of candidate I?BDs that 

-eet all " of- these requireaents depends on the ^..^^ 
5 availability cf the information discussed belov. 

X- 

Infomacicn about cundidate IraDs that -ill be' 
used to judqe the . suitabi 1 i ty of the :P3D includes: 1) 
a 3D structure (knowledge strongly preferred). 2) one 

10 or r.ore sequences honolpgous to the I PHD (the more r:- 
ho-.ologous sequences known, the better), 3) tlie pi of p. ■ 

the IPBO (knowledge necessary in sc=e cases), 4) the j-* 
stability and solubility as a functio.i of tenperature. 
pH and ionic strength (preferably >Lnovn to ' be stable ^i; 

15 over a wide range and soluble in ccr.'ditions cf intended 
use), 5) ability to bind -etal icr.s such as Ca""^ or 
Mq""* (knovledge preferred: binding per se, no 
preference), .6) enzynatic activities, if any (/-.nowledce 
preferred, activity eer se has u-es but cay cause 

20 prcblens) , 7) binding properties, if any (knovledge 
preferred, specific binding alio preferred), £) 
eivailability of a nolecule having specific and strong . 
affinity ( < 10"^^ M) for the l?3D (preferred), 9} \ 

availability of a oolecuLe having specific and r.ediun 

25 affinicy ( lO'^ M < < 10'= M) for the IPBD |, \ 

(preferred), 10) the sequence of a rutant of I?30 that 
does not bind to the affinity r.olecuie(s) (preferred), ^ . 

and 11) absorption spoctrurt in visible, UV, ):>',R, gS£-L 
(characteristic absorption preferred). \- 

. . t 

If only one species of nolecule h.>,ving affinity 

for IPDD (AfM(IPBD)) is available, it will be used to: ^* 

a) detect the IPDD on the G? surface, b) optimize h 

expression level and density of the affinity nolecule f. 

35 on the r.atrix (Sec. 10.1), and c) detemine the 'f 
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efficiency and sensitivicy of the affinity separation 
(Sees. 10.2 and 10.3). As noted above, hc-ever, one 
would prefer to have available two species of 
AfM(IP3D), one --ith high and one vith noderate affinity 
for the IPEO. The species with high affinity would be 
used in initial detection and in deter^iining efficiency 
and sensitivity (10.2 and 10.3), and the species with 
moderate affinity would be used in opt ini zat ion (10.1). 

There are nany candidate IPBDs, 20 or zore, for 
which all of the above information is available or is 
reasonably practical to obtain, for exacple, bovine 
pancreatic trypsin inhibitor (Sm, 53 residues), 
cratnbin (46 residues), third docain of ovomucoid (56 
residues), T4 lysoiyne (164 residues), a.nd azurin (123 
residues) . Structural information can be obtained fron 
X-ray or neutron diffraction studies, W<7i, chen^ical 
cross linking or labeling, aodeling fr== known 
structures of r^ilated proteins, or fron 'r.^ore t ical 
calculations. 2D structural infomacion cbtair.ed by X- 
rny diffraction, neutron diffraction cr NHR is 
preferred because these methods allcv localization of 
almost all o: the atcr.s to within defined lir.ics. 

Most of the PBOs derived :rc= a PFBD according to 
the process of the prosen- invention aCfect residues 
having side groups directed toward the solvent.. 
Reidhaar-Olson and Sauer (P-EI03S) found that exposed 
residues can accept a wide range of anino acids, while 
buried residues arc nore United in this regard. 
Surface nutations typically have cnly sr.aU effects on 
r.clting tcrperature of the PBD, but nay reduce the 
stability of the PBD. Hence the chosen I?5D should 
have a high .-.citing tertperature f6a^C acceptable, the 
higher the better) and be stable over a wide pH range 
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(8.0 to 3.0 acceptable; 11.0 to 2.0 preferred), so that 

the SDDs derived from the chosen IPBD by nutation and f/."- 
selection-chrough-binding will retain sucricicnt 



Tvo general characteristics of the target 
nolecule, size and charge, make certain classes cz 
IPBDs nore likely than other classes to yield 



stability. Preferably, the substitutions in the IPDD 
yielding the various PDOs do net reduce, the melting i 
point ot the domain below 50°C. Mutations Day arise ^ 
that increase the stability of SoDs relative to the ^. 
IPBD, but the process of the present invention does not 
depend upon this occurring. 



derivatives that will bind specifically to the target. ^'/A 
15 Because these are very general characteristics, one can f""" 
divide all targets into six classes: a) large positive, 
b) large neutral, c) large negative, d) small positive, 
e) sr.ali neutral, and f) s.-nall negative. A sr-all 
collection of IPBDs, one cr a few corresponding to each 
29 class of target, will contain a preferred candidate 
IPBD for any chosen target. 



Alternatively, the user rr.ay elect to engineer a ^ i 

C?(IPBD) for a particular target; Sec 2.1 gives j.- , 

25 criteria that relate target size and charge to the ; 
choice of IPSD. 

Sec. 2.1,1: Influence of tarq-?t si:e on choice of IPDO: r 

^. 

1" " 

30 If the target is a protein or other r.acroir.olecuie p 

a preferred embodiment of the IPBD is a sr.ali protein f . • 

such as BPTI fron Bos Taurus (53 residues), crar.bin j^- 
froa rape seed (-'(G residues), or the third domain of 

ovomucoid from Coturn Ix coturn ix Jauo nica (Japanese • 

35 quail) (5^, residues) (PAPA82), because targets from | 

i- 
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this class have clefts and grooves that can accomrr.odate 
sr:all proteins in highly specific vays. If the target 
is a macrocolecule lacking a compact structure, such as 
starch, it should be treated as if it were a s.-nall 
r.olecule. Extended r.acronolecu les with defined 3D 
structure, such as collagen, should be treated as large 
nolecuLes. 

If the target is a snail coiecule, such as a 
steroid, a preferred enbodinent of the IPDO is a 
protein the size of ribonuclease fron Oos t :iurus (124 
residues), ribonuclease froa Asoerai 1 lus cruzae (104 
residues) , hen egg vhite lysozyne from 11 us gal lus 
(129 residues), azurin from Pseud3.-.on.-\s aorugenosa (12S 
residues), or T4 lysozyne (164 residues), because such 
proteins have clefts and grooves into which the snail 
target: molecules can fit. The Brco)chaven Protein Data 
BanV: contains 3D structures for all of the proteins 
listed. Genes e.-ccding proteins as larce as T-; 
lysozyne can be r.anipulatcd by star.card techniques for 
the purposes of this invention. 

If the target is a mineral, insoluble in water, 
one .T.ust consider tr.c nature of t.he molecular- surface 
of the mineral. Minerals that have smooth surfaces, 
such as crystalline silicon, require median to large 
proteins, such as ribonuclease, as IPBD in order to 
have sufficient contact area and specificity. Minerals 
with rough, grooved surfaces, such as zeolites, could 
be bound either by snail proteins, such as SPTI, or 
larger proteins, such as T-i lysozyne. 



Sec. : 
IPPD: 



-1 ■ 2 • Influenc e of target chnrn e cn choice of 
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Electrostatic repulsion between siolecules of like 
charge can prevent molecules with highly co-piementary 
surfaces fron binding. Therefore, it is preferred 
that, under the conditions of intended use, the IPBO 
5 ar.d the target r-olecule either have opposite charge or 
that one of then is neutral. In soz:e cases it has been 
observed* that protein r.olccules bind in such' a way that 
like charged groups are juxtaposed by including 
oppositely charged counter ions in the Dolecular 
10 interface. Thus, inclusion of counter ions can reduce 
or eliminate electrostatic repulsion ann the user may 
elect to include ions in the eluants used in the 
affinity separation step. Polyvalent ions are more 
effective at reducing repulsion than monovalent ions. 

15 

Sec. 2 . 1 .3: Other considerations in the choice or' 
I PSO: 

If the chosen IPED is an enzyme, it tray be 
20 necessary to change one or more residues in zh.e active 
site to inactivate enzyrne function. For exar^ple, if 
th.e IPBO were T4 lysozyrr.e. and the G? were col i cells 
or M13, we would need to inactivate the lysozyr.e 
^because other-fise it would lyse the cells. If, on the 
25 other hand, the CP were PhiX'174, then inactivation of 
lysozyne =ay not be needed because 74 lysozyce can be 
overproduced inside' £^ col i eel 1 s ■ vi thou t detrimental 
effects and FhiX17< for-s intracel lular ly - It is 
preferred to inactivate enzyne I?3Ds that night be 
30 harmful to the CP or its host by substituting mutant 
ar:ino acids at rr.e or .-nore residues of the active site. 
It is permitted to vary one or =ore of the residues 
that were changed to abolish the original enzyT^.atic 
activity of the IFBO. Those CPs that receive osp-cbd 
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genes encodinq .in active cnz/ne nay die, 
najority of cequences will not be deleterious. 



but the 



If ♦ihe binding protein is intended for therapeutic 
use in hunans or animals, the IPBD cay be chosen fron 
proteins native to the designated recipient to ainini-iZe 
the possibility of antigenic reactions. 

Sec. 3: C^oico of OCV : 

The OCV is preferably snail, e.g., less than 10 
KB. - The size of the OCV affects the stability of the 
OCV and its derivatives, and the copy nurr.ber thereof. 
An OCV which is stable, even after insertion of at 
least 1 kh D::a, is sought. A multicopy OCJ is also of 
interest. It is desirable that cassette nutaqenesis be 
practical in the OCV; preferably, at least 25 
restriction cn::yr.os are available that do not curt the 
OCV. It is li-:Gvise desirable that s mqle-st randed 
rr.utagenccis be practical. finally, the OCV preferably 
carriers a selectable r.ar>:er. 

If a suitable OCV dees not already exist, it -uy 
be engineered by nanipulation of available vectors. 

In the cases of bacterial cells and bacterial 
spores, the bacterial chronoso.T.e cculd be used as the 
OCV. Plasnids are, hc-evcr, preferred because genes on 
plasnids arc nuch nore easily constructed and nutated 
than are genos in the bacterial chror.osc.-e . When 
bacteriophage arc to bo used, the csn- io tid gene r.ust be 
inserted into the phage gcnone. The synthetic cs p- ip'rd 
genos can be constructed in snail ve::tors and 
transferred to the CP gencr.e when co:-plete. 
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Phage such M13 do * noc confer antibiotic 

resistance on the host so that one can not select for 
cells infected with M13. An antibiotic resist-tince qene 
can be engineered into the M13 genome (HINESO) . M^re 
virulent phaqc, such as rhiXl74, maVce discernoble 
plaques that Ccin be picked, in which cise a resistance 
gene is not essential: furthermore, there is no room in 
the PhiX174 virion to add any new genetic naterial. 
Inability to include an antibiotic resistance gene is a 
disadvantage becouse it limits the number of CPs that 
can be screened. 

It is preferred that CP(IPBD) carry a selectable 
narker not carried by wtGP. It is also preferred that 
vtG? carry a selectable marker not carried by C?(IPBO). 

Sec, 4! Desicnim the oso-i abd aeno insert: 

Having chcuen a IPBD, a CP, a strateqy for getting 
the IPBD onto 'he C? surface, and a cloning vector, we 
new turn to the design of a suitably regulated gene. 
In this section, ve design an amino acid secfuence that 
will cause the IFBD to appear on the CP surfacf. whpn it 
>is expressed. This amino acid sequence cay determine 
the entire coding region of the csojLilli?4 °^ 
-ay contain only the iohd sequence adjoining 
restriction sites into which random DNA will be cloned 
(Sec. 6.2). 



We will now consider the transcriptional 
regulation of the o .-=n-ipbd gcno; the design of the DMA 
encoding of amino acid sequences; the organization of 
nthesis; the methods of DNA synthesis and 



sy 



purification; and the actual gene synthesis and 
cloning . 
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Thn actual gene may be: a) conpletely synthetic, 
b) a composite of natural and :3ynthetic Dtl^, or c) a 
composite of natural DUA Craqtnents. The important 
D point is that the nhd segment, derived fron the i pbd 
segxcnt, be easily genetically manipulated in the ways 
described in Part III. A synthetic lE^d sog-ent is 
preferred cecauce it allows greatest control over 
placement of restriction sites. Prixers co.Tpler.entary 
10 to regions abutting the o£:D-iob d ger.e on its 3' flank 
and to parts of the cso- i rbd gene that are not to ba 
varied are needed for sequencing. 

Sec. <.\ C-?retic re^'u^ation of the oc-p-irjbd gene: 

15 

■ r.'ov we consider regulation of the osn- i cbd gene to 
enable noduiation of expression. The two ir.portant 
questions are: a) hcv much OSP-IPBO do we need on each 
CP, and b) hew accurately cuct we regulate the a.r.oui'it? 

2C 

The es;3ential function of the affinity separation 
is to ccparote CPs that bear PBDs (derived from IP3D) 
having high affinity for the tarcet froa CPs bearing 
PSOs having lew affinity for the target. If the 

25 eluticn voIutiC of a CP depends on the nurr.i;or of PBOs on 
the CP Gurface, then a CP bearing nany PDDs with low 
affinity, CP(PB3.j), aight co-elute with a G? bearing 
fewer PfJDs with high affinity, CP(P5Ds). Assu.-ne that 
both C?(rDn.^.) and CP(?EDs) bind to the colunn under 

30 scne condition, such as low salt. If a gradient of 
sone solute, such as increasing salt, changes the 
conditions, then all weakly-binding PBDs will cease to 
bind before any sc rong ly-b ind i nq PBDs cease to bind. 
Regulation of the o r.p-pbd gene r.ust bo such that all 

35 packages display sufficient POD to effect a gocd 
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32 



separdcion in Sec IS. If the ar.Ow.it of PRO/GP had on 
effect on t.^e eli:ticn volune of the CP fron the 
affinity nacrix, then we would need to regulate the 



5 analysis sho-s that there is no strong linear effect of 
IP30/CP on ciution volur.e and assur;es only; a) that alt 
CPs Me the sar.e size, b) that interactions between the 
PbCs and the affinity natrix dominate differential 
elution ot CPs, c) that the system is at equi 1 ibr iun, 
10 ■ and d) that all PB03 on any one CP are identical. 

If tip identical PBDs on a CP each have access to 
target r.olecules, and each PBD has a free-energy of 
binding to the target of delta C^, then the total tceo 
15 energy of binding is 



Delta Cu, vill be a fur.ction of several parar.eters of 
20 the solvent, such as: 1) concentration of ions, Z) pH, 
3) ter;pcra ture , 4) ccncent ra t ion of neutral soi'Jtes 
such as sucrose, glucose, oth;inol, etc. / 5) specific 
ions, such as, calciun, acetate, benzoate, nicotinace, 
e tc. If ccnditions are altered during affinity 
25 -separation so that delta Cv, apprc^chci; zero, dc-lta 
C^'^°^ approaches zero ::p tines faster. As delta C*,^^^ 
goes to or above zero, the packages will dissociate 
fron the i.-.T.cbi 1 izcd target noiccuies and be e luted. 

30 CPs hoc ring r.orc PSDs have a sharper transition 

bet'.een boun:i and unbound than packages with fewer of 
the sane PnOs. For equ i I ibr i--.-n conditions, the mid- 
point of the transition is detcrnincd only by -he 
solution ccnditions that bring the individual 



acount of PBD/CP very accurately. 



The following 



delta G^j' 



tot 



= Cp * delta Cb . 



interactions to zero frec-cncrqy. The nucber of 
PBOs/CP detornincs the sharpness of the transition. 

It should also be noted that the nurr.bcr of ?BDs/CP 
5 is usually influenced by physiological conditions so 
that a sampio of genetically identical- Cf-'(PBO)s nay 
contain CPs having riif forcnt numbers of PODs on the CP 
surface In a population of C?(vq?OD)s .each Pno 
sequence will appear on nore that one CP, and the 

10 actual number of PBOs/CP will vary from CP to C? within 
sone range. within a variegated population of PBDs, 
let PSDx be the PBO with maxiroura affinity for the 
target. If there is a linear effect of nuc\ber of 
PBDs/CP, then the CPs having the greatest nuaber of 

15 PBDx will ce most retarded on the column- when we 
culture the enriched population obtained either as an 
effluent froni the colunn or as an inoculua of matrix 
material frcri the- column, . the CP{PBD^) will be 
amplified and give rise to new GP{?BDy.)3 having varying 

20 numbers of P3D;^/C?. Thus the affinity separation 
process of* the present invention could tolerate a 
linear effect of nurr.ber of PDDs/C? on the elution 
volume of the GP(PDD) unless strong binding to target 
fortuitously causes the PDD to be displayed cn the CP 

25 'only in low number. It is cxtrnr.cly uniiKely that all 
PBDs that hind co the target will also be incapable of 
display in large amounts on the CP surface. 

According to the above analysis, there is no 
30 linear effect on elution volume from the nu.-ber of 
:PBDs/CP, hence need for highly accurate regulation of 
IPBD/CP is not anticipated. The analysis above assumes 
that CP(lPDD)s arc in equilibrium between solution in 
buffer and bound to the affinity matrix. Rate of 
35 elution may be an important parameter in colu.-r.n 



atfinity c.M-oaa tcrj rapf^V • In b^cch olution from in 
affinity rijcrix or t:lution froa an affinity ulate, the 
tine chat e<ich buffer is in contact with the affinity 
Qaterial nay be an important variable. The density of 
5 affinity molecules on the natrix is an ir.portont 
variable in optimizing the affinity separation. 
BGcaus<- the analysis above is qv:a 1 i ta t i ve , in Sec. 10 
ct the prof erred cr.bod ir.ont we cxpe r inonta 1 ly optimize: 
1) the density ol IP3D on the CP surface, 2) the 
10 density of affinity noleculos on the affinity natrix, 
3} the initial ionic strength, 4) che elution rate, and 
5) the quantity of CP/ (volume of matrix) to be loaded 
on the column. 

IS A number of pro-.ctcrs are known that can be 

controlled by specific chemicals added to the culture 
□ cdiun. For cxar.plc, the lacUVO promoter is induced it 
isoprcpyl thiogalactcside is added to t^e culture 
ziediu.-Ti, for pxanple, at between l.C uM and 10.0 r.M. 

20 Hereinaf tor , ve use ".XINPUCE" as a generic torn for 
chemical that induces expression of a gone. 

Transcriptional regulation of gene expression is 
best understood and most effective, so wc focus our 

25 attention on the pronoter. If transcription of the 
os n-ip bd gene is controlled by the chemical xriidUCF: , 
then the nur.bor of OSP-IPDDs per G? incrcjses for 
increasinr; concentrations of XINOUCE until a fall-off 
in the nunber of viable packages is observed or until 

30 sufficient IPDO is observed on the surface of harvested 
CP(IPBr))s, The attributes that affect the maximun 
number of or.P-I?LJOs per CP are primarily structural in 
nature. There nay be stcric hindrance or ether 
unwanted interactions between IPSDs if 03P-IH3D is 

35 substituted for every wild-type OSP. Excessive levels 
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of OSP-IPBD r.ay also adversely affect the solubiiity or 
Dorphogcnesis of the CP. For cellular and viral CPs, 
as few as five copies of a protein having affinity for 
another iEr.obilized -olecule have resulted in 
successful affir.ity separations (FERE32a, FE:i£32b. and 
SMITS5) . 

Another consideration of promoter regulation is 
that it is useful later to know the range of regulation 
of the osp-lobd . (Sec. 8) In particular, one should 
deterninc how nearly the obcenca of XINDUCE le^ds ■ to 
the absence of IPBD on the CP surface; a non-leaky 
promoter is preferred. Non-leakiness is useful: a) to 
show that affinity of CP f o so- i obd l.s for AfM(I?3D) i;; 
due to the osd- ipbd gene, and b) to allow growth of 
G?f o ?o-nbd ] in the absence of XINDUCE if the expression 
cf o'^c-nh d is disadvantageous. The l acUV5 prc-oter in 
conjunction, vich the LacI^ repressor is a preferred 
example. 

Sec. A. 2: TNA scnurnce dorign: 



The present invention is not United to a single 
method of ger.e desion. The following procedure is an 
25 exar^plc of one nethod of gene design that fills the 
needs of the present invention. 

Having specified that the amount of IPBD/CP is to 
be experir.entally cptini^ed and that well-studied 

30 available regulatory rechanisins applied to osp-inbd 
gene are sufficient, ve now consider design of a DrfA 
sequence. If the anino-acid sequence of OSP-IPBD is a 
definite sequence, then the entire gene will be 
constructed (Sec. 6.1). If randon DNa is to be fused 

35 to i pcd . then a "display probe" is constructed first; 



tl-- random DNA is then inserted to ccr.plete the 'r/- 
population o: pucncive osp- inbd genes (Sec. 6.2) froni k* 



^.ich 



a functio.-Al osp- Ipbd gene is identified by In 



v^vo selection or kin,-ired techniques, C.' -5 



i 



I-- 



I. 



The osc^- ir*.' ! gene need not be synthesized in zozq; 

parts of the gene may be obtained from nature. One )^ 

nay use any genetic engineering method to produce the .« 

correct gene fusion, go long as one can ea-iily and t 

accurately direct nutations to specific sites in the I 

ebd DNA subsequence (Sec. 14.1). In all of the -ethcds y 

of nutagcnesis considered in the present invention, 

however, it is necessary that the DNA sequence for the 

o so-irbd gene be different from any other DNA in the 

CCV. The decree and nature of difference needed is 

determined by rhe aethcd of mutagenesis to hft used in r /. 

Sec. 14.1. : the ccthod of mutagenesis is to bo r 

rcplAcerr.ent ct subsequences coding for the P50 '-ith 

vcDN'A, then the subsequences to be nutacen i-ied zust be f\ 

r-. 

bounded by restriction sites that are unique wit.. V- 

respect to the rest of the OCV. If si ng le-st ranried- p.- 

o;.igonucleoti^e-directed nutagonesis is to be used. ^; j 

then the C.^A se:;ucncc of the subsequence coding fcr the |' 

r 

Ir3D r.ust be unique with respect to the rest o: t^.e ; 

\{ 

I 

The sequences of regulatory p/»rts of the gene are ^ 

taken fro3 the sequences of natur.il regulatory v. 

elc-ents: a} prc.-otcrs, b) Sh inc-Oa igarno sequences, f-' 

and c) transcriptional te rr. inators . Regulatory p 

eler.ents ccuid also be designed fron> knovledge of t. 

consensus sequences of natural regulatory regions. The t 

sequences of thcje regulatory elencnts are connected to t 

the coding re::ions; restriction sites are also inserted j. 
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in or adjacent to the regulatory regions to allow 
convenient manipulation. 

The coding portions of genes to be synthesized are 
designed at the procein lc%'ei and then encoded in" ONA. 
The amino acid sequences are chosen to achieve various 
goals, including: dioplay of a IF3D on the surface 

of a GP, b) change of charge on a IPBD, and c) 
generation of a population of PBCs frora which to select 
an SBD. - The ar.biguity in the genetic code is exploited 
to allow optir.al placement of restriction sites and to 
create various distributions of anino acids at 
variegated codons- 

Si^.c, specific PMA ser^tience agsLanr^ent: 



A cc-nputer progr-r> r.:iy be used to construct an 
anbi(:uous DNA sequence coding for an a-ino-acid 
sequence given by the user. That is. the DNA sequence 
contains codes Cci all possible CNA sequences that 
produce the stated amino acid sequence. The codes used 
in the ar.biquouc Dtir\ are shown in Table 1. An p>:anpic 
of an anoiguous CriA sequence is given in Table 3. 

The ur.-r supplies lists of restriction enzyr.es 
that: a) do net cut the OCV, and b) cut the CCV only 
once or twice. Kcr each crzyr.e the prograni rc;\ds: a) 
Che nanc, b) the recognition sequence, 
pattern, and d) the nar.cs of suppliers. 
ONA sequence cooing for che stated ar.ino acid ?equence 
is cvamincd for places that recognition sites for any 
of the given enzyr.es could be created without altering 
the anino-acid sequence. A master table of enzyr.es 
could bo cbtainL-d from the catalogues of cnzyne 
suppliers such as the suppliers listed in Table 



c) the cutting 
The arJaiguous 



4 or 



Other sources, such as Roberts' annujl reviev of 
restriction ennymcs in Nucleic Acids Research. 

Each potential recognition site c;iuscs a record 
sin;iiar to the fallowing to bo written 

Hind Iir'^-S, B,M, I P> Loc^O T 9 B^X'j Oir--n Cut elective 

1 - w 

Sic Cut ^ 1/6 

TTrlxcc 5'i;r:r;NriA accttnvnni>i3 ' 

XT ' 3 'N't^riN'NTTCCA ANr;NN»5' 

rxr 



Protei n seq : k - s 

aa *c : 3 

10 possible DtiPi :P^\r 

cutter : A 

result :AAA 



AGC 
ACC 



15 



25 



30 



Xhe top line identifies the encymo, H 1 nd III in 
this example, and the supplier Cthrouqh codes given in 
Xdblo 4}; "Loc^Q" indicates that recognition begins 
with nucleotide 0; "T^S" indicates that the rtntisenr-e 
(top) strand of D:/A is cut after base 9; ••D---13" 
indicates that the sense strand (or bottom strand, not 
shovn except in the dsCNA on the right) is cut between 
bases 13 and U (reading left to right). •'Dir=n" 
indicates that recor;r.i t i on is "nornal". liiO^ 
recognizes palindromic ccquoncos, as do nciit 
restriction cnzy-cs. Sor.c enzynos have asyn-r.c^t r ic 
-recognition, however, and cut to one i;ido; for those 
enzymes, the recognition could be "notr.al" or 
"reversed" depending on whether the cnr.yme cuts to the 
right or left of the recognition site. Hare 
unambiguous stretches that require certain rer.triction 
sites arc labeled as "obligatory' 
elective are so labeled. 



those ■ th::t a re 



The second and third lines show the arr.ino-acid 
3 5 sequence and residue nur.bcrs for which this region o: 
OCA codes. The notation "Cut ? l/ti" indicates that 
this is the first of six por.r.iblc Hi.nv.l ill sites. 
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The fourth line shows the antisense strand of DrU 
coding for the desired arnino-acid sequence. The fifth 
line shows the recogniticn pattern of the enzyme. The 
sixth line shows the consensus between the OiU sequence 
5 required by the amino acid sequence and the DtJA 
sequence recognized by the restriction enzv" e. The 
dsONA to the right shows the ends generated by the 
restriction digestion. 

10 The program also prints a table sumnarizing the 

possible sites. An exa.Tiple of such a sunmary of 
potential sites is fo'jnd in Table 5. 

The choice of elective restriction sites to be 
15 built into the gene is determined as follows. 

The goal is to have a • series of fairly unifonr^ly 
spaced unique restriction sites with no r.ore than a 
preset r.ax ir.ua nur.be r of bases, for exa-ple 100, 
20 between sit^s. Unless required by other sites, sites 
that are- not present in the parental CCV are not 
introduced into the designed gene rtore than op.ce. 
Sites that occur only once cr twice in the parental OCV 
'arc not introduced into the designed ge.-ie unless 
25 .necessary. 

First, each en^yr.e that has a unique possible site 
is picked; if two of these overlap, then the better 
enzyr^e is picked. An enzyr.e is better i: it: a) 

30 generates cohesive ends, b) has unar.biguous 
recognition, or c) has higher specific activity. Next, 
chose sites close to ether sites already picKed are 
elininatcd because many sites very close together are 
not useful. Finally, sites are chosen to nininice the 

35 size of the longest piece between restriction sites. 
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The azisig-Jity of the Dr.-A becueen the restriction 
sites is resolved frca the following considerations. 
If the given anino acid sequence occurs in the 
recipient crr-inisn, and if the ONA sequence of'the gene 
in the orgar.isr. is >:r.cv:i. then, preferably, riaximize 
the diffcrer.ces betvocn the engineered and natural 
genes to ninirize the potential for rcc^ab i nat ion . In 
addition, the fcllo'-ing ccdons are pcorly translated in 
r , col i and, therefore, are avoided if possible: 
cta(L), cga (R) , egg (R) , and agg (R) . For other host 
species, different codon restrictions vould be 
Appropriate. Finally, long repeats of any one base ore 
prcne to citation and Thus are avoided. Balancing 
these considerations, vc can design a DtiA sequence. 

S gc . 5.1: C r"2n i 2ri t i rr. of nnnfj synthesis: 

?;cv ^--e crnsidor --ays to divide the syrithosis of 
the designed gene irto r.anage?.ble segr.ents . The 
present invention is not linitcd as to hcv a designed 
C:.'A sequence is divided for easy synthesis. Thft 
folioving procedure is an example o: he- :-;L:ch. synthei: i'.i 
Dight be can*ged. 

An estab) is.-ed -ethod is to synthesize both 
strinds of the entire gene in overlapping sec-ents cf 
20 to 50 r.ucleotid*2S (nrs) (T!tERS3), Delo*- ve provide 
an alternative r.ethcd that is more suitable for 
synthesis cf vgL):iA. This :nothod is sir.ilar to r:ethods 
pu::rijnod by Oliphant ot aK (0L:P3G and OLIP87) and 
Ausubel ot Hi (AUSUoT). Our adaptation of this niethod 
differs frcr. previous r.cthods in that wo: a) use two 
synthetic strands, and b) do not cut the extended DI.A 
in the niddle. Our gcals arc: a) to produce Icnqer 



pieces of dsDNA Chan can be synthesized as ssONA on 
commercial DNA synthesizers, and b) to produce strands 
complementary to single-stranded vgDSA. By using two 
synthetic strands, we remove the requirement for a 
S palindromic sequence at the 3' end. 

DNA synthesizers can currently produce oligo-nts 
of lengths up to ICO nts in reasonable yield. M^f^^ = 
.100. The parameters U,^ (tho length of overlap .needed 

0 to obtain efficient annealing) and Ng (the number of 
spacer bases needed so that a restriction enz^T-e can 
cut near the end of blunt-ended dsDNA) are detemined 
by ONA and enzyme chemistry. - 10 and = 5 arc 

reasonable values. Larger values of N'^ ''s 

5 allowed but add to the length of ssDNA that r-ust be 
synthesized and reduce the net length of dsONA that can 
be produced . 

Let Al be the actual length of dsDNA to be 
0 synthesized, including any spacers. Al ^I'-st te no 
greater than (2 HotiA " ^'w) • 1'^^ Q.^ be the nurier of 
nts that the overlap window can deviate from center. 



is never negative. It is preferred that the f-o 
0 fragncnti: be approx i:na tcly the same length so that the 
amounts synthesized will bo approximately eq-jal. This 
preference may be overridden by other considerations. 
The overall yield of dsDNA is usually dominated .by the 
synthetic yield of the longer oligo-nt. 



' kc use the fol loving- procedure to generate d£0;:A 
of lengths up to (2 Mdj,,^ - Ny) nts through the use of 
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Klencv fruqr.ent to extend synthetic ONA frcfj^.Gnts 

that are not nore than ^QUf, nts long. when a p-iir of 
long oligo-nts, conple.-^entary for nts at the.'.r 3' 

ends, arc annealed there will be a free 3' hydroxyl and 
a long iictltiA chain continuing in the 5' direction on -t 
either side. v;e v/ill refer to this situation as a 5' 
superovcrM/ing. The procedure comprises: * ^ ■; 



1) picking a non-palindronic subsequence of H.j tc 
10 nts near the center of the dsDS'A to be 

synthesized; this region is called the overlap 
(typically, U.j is 10) , 



4) annealing the tvo synthetic strands that are 
cor.plonent;iry throughout the overlap rogion, and 

25 ' 5) extcn-iing both supcrovcrhangs wit.h Klencu 

fragnent and ail four dcoxyr.uclcot ide 
triphosptiatcs . 

Deduce Kq^i;^ is not rigidly fixed at 100, the 
30 ' current limits of 100 (= 2 H^^j;^ - N^) nts overall and 
100 in each fragr.ent ore not rigid, but cjn be exceeded 
by 5 or 10 nts. Coing beyond the linits of 190 and 100 
vill lead to lover yields, but tlicsc may be acceptable 
in certain cases. 
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2) synthesizing a ss DU^ molecule that conpriscs ^; 
15 that part of the anti-sense srr-and from its S' end 

' up CO and including the overl-^^J. 



m 

3) synthesizing a ss DNA molecule that ccrr.prises 
that part of the sense strand from its 5' end up ij'.- C 

20 to and including the overlap. 
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Restriction enzyr.es do not cue well at sites 
closer than about five base pairs fror. the end of blunt 
ds DNA fragments (OLIP37). Therefore nts C-'ith 
typically set to 5) of spacer arc added to ends that wo 
5 intend to cut with a restriction enzyne. If the 
plasnid is to be cut with ^i b lunt-cutt i nq cnzync, then 
-e do not uU-i any s^jacor to the cor respond inq end of 
the ds DfU fracj.r.ent. 

10 To choose the optir.un site of overlap for the 

oligo-nt fragrtonts, first consider the anti-sense 
strand of the DN'A to be synthesized. including any 
spacers at the ends, written (in upper case) froa 5' to 
3' and left-to-riqht. The My nt long overlap 

15 windcw can never include bases that are to be 
variegated, N . 3 . : The ll.j nt long overlap should not te 
palindrcaic le-^t single DMA nolocules pri-e themselves. 
p:ace a U.j nt long window as close to the center of the 
anti-sense sequence as possible. Check to se-^ w:.':ther 

:o one or ::iore ccdons within the window can he changed to 
incr-iaGc the CC content without: a) destroying a needed 
restriction site, b) changing Amino acid sor^uence, or 
c) naking the overlap region p.i 1 indror.ic . If possible, 
change sc.T.e AT base pairs to GC-pair::. 1: the GC 

25 content ot ths window is less than !ia'(, slide the 
window right cr left as nuch as Q-^ nts to maximize the 
ru-ber of C's and G's inside the window, but without 
including any variegated bases. For each trial setting 
of the overlap window, maximize tt:e CC content by 

30 silent ccrion changes, but do not destroy wanted- 
restriction sites or r.akc the overlap palindromic. If 
the best retting '•-till has less than bOX CC, enlarge 
the window to t.':^.*-2 nts and ploi^c it within five nts of 
the center to obtain the maximum GC content. If 
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er.l.^rginq chc windo-J one or cuo n;:s viU increase the 
CC content,. do so, but do not include varioqatcd bases. 

Underscore the anci-sense strand from the 5' end 
up to the right edqe of the vindow. Write the 
compLenentary sense sequence 3'-to-5' and 1 ef c-to-r iqhc 
and in lower case letters, under the anti-sense strand 
scartinq' at the left edqe of the --indow and continuirq 
all the way to the riqht end of the anti-sense strand. 

We will synthesize the underscored anti-sense 
strand and the part of the sense strand that '^e wrote. 
These two fraqnents, cor.ple.-^.entary over the length of 
the window of high GC content, are nixed in equi.-nolar 
quantities and annealed. Those fraqnents arc .extended 
with Klenow fragment and all four deoxynucleot idc 
triphosphates to produce ds blunt-ended DUA. This Dt^A 
can be cut with appropriate restriction enzynes to 
produce the cohesive ends needed to lioate th- fragment 
to other D;JA. 

q^f-. S7! n::A svnth<"tr. is and nuri Mention r.^.-thods : 



The: present invention in not United to any 
'particular method of c::a synthcsi-^ or ccnstruct : on . 
The- following procedures exerr.plify one way to ochiev. 
the -joals of the procent invention. 

DNA is synthesized on a MiUigen 7500 OUA 
30 synthesizer {Milliqcn, a division of Millipore 
Corporation. Uedford, KA) by standard procedures. 
Software to control the synthesizer and to keep records 
of each synthesis is supplied by MLlLigcn. 



The foilcving reagents are supplied by MiUiqen: 



9S 

1) iH-tetrazolc in aceton itrilo , 

2) 3't (v/v) d i chioroacct ic acid in 
dichlorocc thane , 

3) Acetic anhydride in 2,6 lut idine/acetcnit rile 
(1:1:8) . 

4) 6. SI d ir.Gthylaninopyridine' in ace ton i t r i lo , 

5) O.IM iodine in 2,6 ' 

lut idine/uater/tetrahydrof uran (8:8:84), 

6) 3\ (v/v) tr iethylarainc in ace toni tr i le , 

7) DMT-dAdenosine(Bz) cyanoethylphosphoraraidite 

8) DMT-UCytid ine (Dz) cyanoothylphosphoranidite 

9) OMT-dCuanosinQ( iSu) cyanc" thylphosphoraciid ite 

10) CMT-dTh/nidir.ccyanoethylphosphoromid ite 

11) Acetonitri le , anhydrous 

TetrazoJ.c and acetonitrile are stored over 
molecular sicver* to sequester water. 

Phoc^pnoran id i tcr- are dissolved in anhydrcu.i; 
aceconitrilc (MiUiqen) at 0.1 g/r.l. All other 
acetcnitrilc used in the syntheses is "Lou'-vater 
Acetonitriio" supplied by J. T. Baker Cliecical Ccr.pany 
/ Phillipsburg, :iJ ) . Synthesis columns containing 
supports charged with an initial base for ecch of A, C. 
C, and T are obtained fro- Milliqen in two types, high- 
loading and lov-loading. High-loading colur.ns are used 
for syntheses of oligo-nts containing up to 6C bases 
and contain . betvecn 35 and 70 microtr.o los of anidite/g 
of support. The exact amount "^ries fron lot to let. 
[.ow-loadipg colunns containing between <; and 7 
nicrorr.olcs ar.iditc/g support are used for syntneses of 
oligo-nts containing 60 bases or r.ore. 
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The Milligcn 7S00 has jevcn vials tron which 
pnosphoramidicos nc\y be taVcen. fiorr.ally, the firsc 
four contain A. C, T, and C. The ether three vials 
iray contain unusual bases such as incsine or mixtures 
of bases, the so-called "dirty bottle". The standard 
software allows proqranned nixing of two, three, or 
four bases in cquir.olor quantities. 
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When a synthesis is conplcte, the Dr.'A is re-oved 
Crop the support by incubating the supports in 1 ml of 
fresh 28-30* aznonium hydroxide solution (EM Science, a 
division of EM Industries, Inc., Cherry Hill, :;J) for 
15 hours at 50 degrees C. The solution is dried under 
vacuum and the Dr.'A rcsuspendcd in 200 nicrol iters of 
UPLC-gradc water ( 8£iV.or-Analy zed Reagent J.T. 
eaKer Chemical Co.) and is purified by h iqh-pressure 
liquid chroir.accgraphy (HPLC) or f'ACc. 
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With low-loading supports, a 'S5-basc- long ol igo-r.t 
is typically obtained at 1-2^ of theoretical, yield, 
j . ^ . 10 ug; a 100-bo se- long olico-nt i- typically 
attained in 0.5--; of theoretical yield, L^q^ 5 ug. With 
high-ioad i fjg supports, 1 rg of a ; 0-case - long oligo-n'c 
is typical 1> obtained. 



The present invent icn is not li-itcd 
particular r.ethod of purifying r:;A for 
engineering. KPLC is used for ooth 



to any 
genet ic 
oligc-nts and 
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fragments of several kb. Alternatively, -jg-trose gel 
electrophoresis and o I ect roe lu t ion on an' lUI device 
( Internat i.on.-il Biotechnologies, Inc.. New Haven, CT) is 
used to purify large dsON*A Cragn;cnts. For oligo-nts, 
f-ACE and c lect roelu t ion with an Epigt.ne dev ice ■' ( Ep igene 
Corp., Baltimore, r.D) are an alternative to HPLC. One 
alternative for O.'lA purification- is HPLC or. a Waters 



(division of Millipore Corporation) HPLC syscen using 
the GonPak ( -TAX colur.n. A sanple of 100 picograras 
(pq) to 10 'jg can be leaded and recovered in 101-80% 
yield. The recovery varies vith the. 'size and 

ccncentrat Lcn oc the D:.*A, and whether it is single or 
dcubltr stranded. A N'A?5 column fro.-a Pharmacia (Sweden) 
is used to ccsalc DS'A elutcd fron the CenPaJc column. 
After passage over the :.*A?5 colunn, the DNA solution is 
vacuuQ desiccated. 

Sec. 6.1: Clzr.ir.Q of S-.cvn OSP-icbc c^ne into GCV: 

■ In this section, '-e clone the oszi- iobd gene or the 
display prcfca that vo have designed. In the preferred 
nethod, the synthetic gone is constructed using 
plasniids thac 3re transforncd into bacterial cells by 
standard r.ethcdiJ (>'J<UXS2, p250) or slightly r.odified 
standard -eiihcis. Al ternAt ively , ON'A fragnerts derived 
fro.-3 nature are cperably linV:cd to other fragrients of 
d:.*A derived irc- nature or to synthetic DNA fragments. 
In r.ost cases l;: the prorerrcd r.ethod, gene synthesis 
involves cor:= true t ion c: a series of plasmids 
containing larger and l:»roer segr.er.ts of tne ccnplete 
gene. Each pla-jnid that contains a nevly added portion 
of the oso- : r\':\ gene or of the display probe is tested 
by rcstricticn diqosticr.. Plasr.ids having the expected 
restriction digestion pattern are sequenced in the 
region cf ir.e latest alteration to confirn the 
synthesis. 

If, for ccr.vcn icnce, snail plasr-ics vere used for 
gene synthesis, the complete osp- i rbd gene or display 
probe is succlor.ed into the OCV at this point. 





( 




w 



10 



15 



20 



:5 



:o 



)5 



98 



Sec. 6.2 ClonincT of R^ndcn Dr.'A fPccontial cs = 
DisPlnv Probe: 



Into 



IC random DNA and phenotypic Gelecticn or 
screening are used to obtain a G?(IF3D), then ve clone 
random DNA into one of the restriction sites thit was 
designed into the display probe. 

The randon ONA .-nay be obtained in a variety of 
ways. Degener.Tte synthetic Otifi^ is one possibility. 
Alternatively, psoudorandcr: DNA nay be t^Aken frora 
nature. If, f or . exonple, an S^Il * site (GCATC/C) has 
been designed into the display probe at one end of the 
vPtd fragment, then we would use Kla III (CMC/) to 
partially digest so:ne D::a th.-\t contains a wide variety 
of sequences, gcner.-iting a vide variety fr^gr-ents with 
CATC 3' overhangs. Preferably, the display prcte is 
desit;nod with different restriction sites at each end 
of the irbd gene so that r.^ndon n.VA can be clcned at 
either end at the user's dir-cretion. The gd.-ic-e of an 
.organism would be a suitable source of DUX with high 
sequence diversity. 

A plasmid carrying the display probe is digested 
with the appropriate restriction enzyme and the 
fragmented, random D:.*A is annealed and lighted by 
standard methods. The ligatcd pi as- ids are used to 
transform cells that aro. grown a.".d selected for 
expression of the .mt ib iot ic- res is tance gene. ?Ias.-nid- 
bearing CPs are then selected for the d isplvy-of - 1 PBO 
phenotype by the procedure given in Sec. 15 of the 
present invention using Af>:(tPBD) as if it -ere the 
target. Sec. 15 is designed to isolate CP(?5D)s that 
bind to a target from u l.-\rge population that do not 
bind. tJse of the procedure of Sec. 15 to isolate a 
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genetic construction that leads to the display of a 
single type of IPBD is different fron the designed use 
in one inportant way: any CP that displays' the IPDD 
will bind tightly ond CPs that do not display IPBO uill 
5 not bind, hence any reasonable amount of AfM(IP3D) on 
Che matrix will identify a successful clone. 

AS an alternative to selecting CP(irBD)s through 
binding to an affinity column, we can isolate colonies 
10 or plaques and screen through use of one of the methods 
listed in sec. 8 to idcntiTy clonal isolates that 
display IPUO on the G? outer surface. 



15 



Sec. 7: Harvest of CPs 



After transforning cells with ligated cloning 
vectors, we fir^t grow the CPs in non-selective 
conditions to alio--- e>:prcssion of the antibiotic- 
resistance n^rkers on the cloning vector. After a 
20 grow-out, we apply selective pressure to >:ill 
untransf ormed cells. 

CPS are harvested by r.cthods appropriate to the CP 
at hand, generally, ccntri Tuga t ion to peUctize CPs and 
25 resuspension of the peUotc in sterile nedium (cells) 
or buffer (spores or phage). 

Sec. P.: V nrific.^rif-n of 0 i t'-P 1 aY_St.OU,cgvi. 
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The harvested pacKagcs are now tested to dcternine 
whether the li-OD is present on the surlnce. In any 
tests of GPS for the presence of IPBO on the CP 
surface, any ions or cofactors known to be essential 
for the stability of IPUD or AfMdPOO) must be included 
at appropriate levels. The tests can bo done: a) by 
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affinity labeling. b) e n z y m a ti c a 1 1 y . c) 
spcctrophocoriecrically, d) by affinity separation, or 
e) by affinity precipitation. The AfM(rpnD) in this 
step is one picked to have strong affinity 
(preferably, < 10"^^ M) for the IPBD molecule and 

little or no atiinity for the wtGP. For cxanple, if 
BPTI were the IPDD, trypsin, anhydrotrypsin, or 
antibodies to bPTI could.be used as the AfM(BPTI) to 
test for the presence of BPTI. Anhydrotryps in , a 
trypsin derivative with serine 195 converted to 
dehydroalanine, has no proteolytic activity but retains 
its affinity for BPTI (AKOH72 and HUBET?). 

■ rrcferably , the presence of the IPBD on the 
surface of the CP is denonstrated through the use of a 
soluble, labeled derivative of a AfM(IPDD) with high 
affinity for IPBD. The label could b*i: a) a 
radioactive aeon such as ^'^^L, b) a chemical entity 
such as biotin, or 3) a fluorescent entity s;ich as 
rhodamine or fluorescein. The labeled derivative of 
AfH(IPDD) is denoted as AfM(IPBD)*. The preferred 
procedure is: 

1) mix Af::(IPnD)* with GPs that are to be tested 
for the presence of IPCD; conditions of nixing 
should favor binding of IPBD to AfM(tPnD)*, 
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2) separate CPs froa unbound AfMCIPDD)* by use of: 

a) a r.olccular sizing filter that will pass 
AfM{IFBO)* but not CPs, 

b) centri fugat ion, or 

c) a r.olecular sizing colu.-nn (such as 
Sepharose or Scphadcx) that retains free 
Af:<(IPBD)* but not GPs, 
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3) quantitate the Af:<(IPBD)* bound by CPs. 



Alternatively, if the IPBD has a known biochemical 
5 activity (enzynatic or inhibitory), its presence on the 
G? can be verified thrc'jah this activity. For exa-.ple, 
if the IPBD were SPTI, then one could use the 
stoichiometric inactivaticn of tr-zpsin not ^only to 
demonstrate the presence of BPTI, but also to 
10 quantitate the amount. 



If the I?3D has strong, characteristic absorption 
bands in the visible or UV that are distinct from 
absorption by the wtG?, then another alternative for 
oeasuring the IPSO displayed on the C? is a 
spectrophotcretric r.easurenent . For exaT.ple, if IPBD 
vere azurin, the visible absorption could be used to 
identify CPs that display azurin. 
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Another alternative is to label the CPs and 
neasure the a-ount of label retained by in.nobil i zed 
AfM(IPBO) . For exar-ple, the CPs could be grcvn with a 
, radioactive precursor, such as or ^H-thymid ine , 

and the radioactivity retained by i--obilized AfM{IP2D) 
25 neasured. 
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Another alternative is to use affinity 
chromatography; the ability of a CP bearing the IPBD to 
bind a natrix (cl Sec. 15.1) that supports a AfM(:?DO) 
30 is measured by reference to the wcCP. 

Another alternative for dctcctir.g Che presence oC 
IPBO on the CP surface is affinity precipitation. 
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If random OHA has been used, then the procedures 
of sec. 15 are used to obtain a clonal isolate that has 
the disploy-of-IPBD phcnotvpe. Alternatively, clonal 
isolates -.ay he screened for the d isplay-of-IPBD 
phenocypc. The tests of this step are appUn.d to one 
or -.ore of these clonal isolates. 

If no isolates that bi.nd to the affinity r.olecule 
are obtained «e take corrective action as disclosed xn 
Sec. 9. 

If one or nore o. the tests above indicates t.-.at 
the IPDD is displayed on the CP surface, -e ver.fy t..a- 
the binding of molecules h.^ving known affinity for I.BO 
is due to' the chineric psp-icbd gene through the use of 
standard genetic and biocher^ical technig-.es, such as: 

1) transferring the oso^oM <ione i.nto the p-.rent 
CP to verify that Q^zlzfA confers bindi.-.g. 

2> deleting the o^szl^ ^"^ '^''-^ isolated G? 

to verify. that loss of oro-iobd causes loss c: 
binding, 

3) Showing that binding of CPs to AfH{IPBOi 
correlates with tXINDCCE] (in those cases t.hat 
expression of "^"-ir-bd is controlled by 
(XI.'lOUCE) ) , and 

i) shoving that binding of CPs to AtM(tPCO; is 
specific to the innobiliicd Af«(irBO) and not to 
the support matrix. 








t 

I. 

i 



1^ 

m 
m 



m 



m 



10 



20 



25 



30 



35 



O 



Q 



103 

Variation of: a) binding of CPs by soluble A£H(I?3D)*, 
b) absorption couc&d by IPDD, and c) biochemical 
reactions of IPBD arc linear in the amount of IPQO 
displayed. Presence of. IPDD on Che CP surface is 
indicated by a strong correlation between [XIHDUCE] and 
the reactions that are linear in the anount of IPBO- 
Loakiness of the pcor.oter is not: 'likely to present 
problems of high background with assays that are linear 
in the amount of IPBD.. These experiments r.ay be 
quicker and easier than the genetic tests. 
Interpreting- the" effect of [XINDUCE] on binding to a 
{AfM(IPBn) I colunin, however, may be problematic unless 
the regulated promoter is complete ly ' repressed in the 
absence of [XINDUCE]. The affinity retention of 
GP(IPQD)s is not linear in the number of IFBDs/C? and 
there may be. for ex-ample, little phenotypic aifferer.ce 
between CPs bearing 5 IPBDs and CPs bearing 50 IFBOs. 
The demonstration that binding is to AfMtimD) ar.d the 
genetic tescs are essential; the tests with Xlt.'DUCE are 
optional . 

We sequence the relevant i obd gene fragment frcm 
each of several clonal isolotes to determine the 
construction. 

v;e establish the- maxir.um salt concentration and pH 
range for which the CPdPBD) binds the chosen 
AfM(IPBO). This is preferably done by measuring, as a 
function of salt concentration and pH, the retention of 
AfM(IPBD)* on molecular sizing filters that pass 
AfM(IPOr)) • but not CP. 

If the IPOD is displayed on the outside of the CP, 
and if that display ic; clearly caused by the introduced 
osp-ipbd gone, we proceed to Pnrt II, othen--ise we must 
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analyze the result and adopt appropriate corrective 
measures . 

Sec. 9: Por^orrina tho Dlso1.=*v Syston: ■ 

If we hove otcctnptcd to fuse an i nbd froq.Tient to a 
natural or.n craqr.er.t, n-jr option:; arc : 

1) pick a difrercnt fusion to the same by 

a) using opposite end of g^, 

b) keeping nore or fewer residues from os^ in 
the fusion; for example, in increments o: 3 
or 4 residues, 

c) trying a known or predicted domain 
boundo ry , 

d) trying a predicted loop or turn position. 
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2) pick a different osp . or 

2 0 3) switch to random DN*A method. 

If we have just tried the random 0:/A nethod 
unsuccessfully, our options are : 

25 1) choose a different relationship between iabd 

fragment and randon DJIA ( i nhd first, random DNA 
second or v Ice v^rcn) * 

2) try a different degree of partial digestion, a 
30 different enzyme for partial digestion, a 

different degree of shearing or a different source 
of natural D:.'A, or 

3) switch to the natural OSP method. 
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If oil rea-jonable OSPs of the current CP have been 
tried and the random DNA method has been tried, both 
without success, we pick a new CP. 

Sur.narv of P^rt 1: 

In Part I, wo have conistructcd a CP(IPBD). 
Although the tarqet r.aterial is not picked until Part 
III, we have already discussed the general properties 
of targets that influence the choice of IPBD. The user 
n^y use the first CP(IP3D) as the starting point for 
design and construction of other CPs: GP{IPBD1), 
CP(IP302), etc. The different IPDDs. night differ in 
charge and size in such a way that, for any torgot. at 
least one of the GP(IP3D)s will be appropriate as a 
starting point to develop a protein that will bind to 
that target. 

Part XI 

Sec. 10.0: Affinity Sog-Tr^tion Moans: - 



In Part 11 -e optimize an affinity separation 
systen that will be used in Part III tc enrich a 
25 population of GP(vgPBD)s for Chose CP(PBD)s that 
display PBDs with increased affinity for the target. 

Affinity chromatography is the preferred means, 
but FACS, electrophoresis, or other means nay also be 
30 used. 

5^pc. lO.-" nnriniyatlon of Affinitv Ch ro.-^a t oar^ptxy 

Seog rat ion : 
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For linear gradients, clution volume and eluanc 
concentration are directly related. Changes in eluant 
concentration cause CPs to elute from- the column. 
Elution volume, hovever, is more easily measured and 
specified. It is to be understood that the eluant 
concentration is the agent causing CP release and that 
an eluant concentration can be calculated frcn an 
elution volune and the specified gradient. 

Using a specified elution regirne, we compare the 
elution voluaes of GP(IPBD)s with the elution volumes 
of utCP on affinity columns supporting AfM(IPBD). 
Comparisons are made at various: a) amounts of IP3D/CP, 

b) densities of Af H { IPBD) / (volume of fnatrix) (DcAMcM), 

c) initial ionic strengths, d) elution rates, e) 
amounts of GP/ (volune of support), f) pMs, and g) 
temperatures, because these are the parameters nost 
likely to affect the sensitivity and efficiency of the 
separation. We then pick those conditions giving the 
best iseparation. 

We do not optimize pH or temperature; rather ve 
'record optimal values for the other parameters for one 
or more values of pH and temperature. The pH used must 
be within the range of pH for which C?(IPBD) binds the 
AfM(TPBD) that is being used in this step- The 
conditions of intended use, specified by the user (Sec. 
11), may include a specification of pH or temperature. 
If pH is specified, then pH will not be varied in 
eluting the colu.-nn (Sec. 15.3). Decreasing pH may, 
however, be used to liberate bound CPs from the ::atrix. 
Similarly, if the intended use specifics a temperature, 
wo will hold the affinity column at the specified 
temperature during elution, ■ but we might vary the 
temperature during recovery. If the intended use 
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r'/i specifies the pH or tcrr.pc ra ture . then ^-e pr-for that 

irV the affinity separation be optiuized for all other 

paraneters at the specified pH and terr.peratu rc . 
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5 In the optinization devised in this step, we 

preferably use a nolocule known to have noderate 
'^'^^ alfinity for the IP3D (K^ in the range 10"^ M to lO'^ 

t:j ^ for the following reason. When popuiocicns of i^.:. 

^I;- CP(vgPBO)s are fractionated, there will be roughly 

10 three subpopu la t ions : a) those with no binding, b) 

^ those that have sonie binding but can be washed off with t^y." 

1^ high salt or low pH, and c) those that bind very y,/: 

tightly and nust be rescued In situ . We- optiaize the ^' .- 

parameters to seporate (a) from (b) rather than (b) 
15 fron (c) . Let P3Dy be a PBD having weak binding to the t._... 
tarqet and PBD<- be a PDD having strong binding. Higher 
UoAMoM raight, fcr exanple, favor retention of C?{PBD^) 

^'^^ but also Doke it very difficult to elutc viable 

r^i CP(?BDs). We will op':ini3c the affinity separation to g-,;. 

?ii 2 0 retain CP(Paa^;) rather tnan to allow release of y ■ 

CP(PBDs) because a tightly bound C?{?BD^} can be ^-/V 
rescued by 1q situ growth. If we find that DoAMoH f.-- 
strongly affcc-.s the elution volume, then in part III 

^"■-•^ we nay reduce the anount of target cn the affinity v- ;•■ 

25 column when an SBD has been found with rzodera tely i^; 
' strong affinity (K^ on the order of iC"'' M) for the j-, ; 

? target. 

a --■ . K 

^r^ In case the promoter of the oso-i:3bd gene :s not 

'J-^iM 30 regulated by a chemical inducer, wc optimize DoAMoM, t/. 
>r]^ the elution rate, and the amount of CP/volume of j^-i 

matrix. If the optimized affinity separation is j^:- 
^ acceptable, we proceed. If not, we must develop a 

rf^T means to alter the anount of IPSO per CP. A.nong CPs 

L'i^ 35 considered in the present invention, this case could 
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ari=c only for spores because reguL.cabla pro.o.crs are 
avaiUble for ail other systems. 

If Che amount of IPBD/sporo is too hiqh, ve could 
engineer an operator site into the -. osp-iohri qcne. We 
Choose the operator sequence such that a repressor 
^cn-it'-e to a sr.all diffusible inducer rccoinizec the 
Ipocator. Alternatively, we could alter the Shine- 
Dal.arno sequence to produce a lo-er ho.olo<;y w.th 
consensus Shine-Dalgarno sequences. If the anount of 
IPBD/spore is too low, we can introduce variab.lvty 
into the promoter or Shine-Oa U,a rno sequences and 
screen colonies for higher anounts of IPBD/spore. 

in this step. we neasure elution voluaes of 
q»n-.ically pure CPs that elute fro= the affinity 
matrix as sharp bands that can' be detected by UV 
ao.orption. Alternatively, sanplcs frcn effluent 
fractions can be plated cn suitable r.ediun. (cells or 
spores, or on sensitive cells (phage) and colonies or 
plaques counted. 

several values of IPBO/GP. DoAJIoM, elution rates, 
initial ionic strcnnths, and loadings should be 
. examined. Th^ foUoving is only one of r.any ways in 
Which the affinity separation could be optini.ed. we 
anticipate that optimal values of IP3D/CP and OoAKoM 
will be correlated and therefore should be optinized 
toq^-ther. The effects of initial ionic strength, 
elution rate, and anount of GP/ (matrix volume) are 
unlikely to be strongly correlated, a.nd so they can be 
opti.T.ized independently- 
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For each set of paramecerG to be 
cclunn is elutod in a spcciCied r^anner. 



tested, tne 
For example, 
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we may use a regir.G called EluCion Regime 1: a KCl 
gradient runs fron lOmM to naxiinu.-n allowed for the 



GP(IPBD) viability in 



100 fractions of 0.05 



followed by 20 fractions of 0.05 Vy at maximum allowed 
KCl; pH of the buffer is maintained at the specified 
value with a convenient buffer such as phosphate; Tris, 
or MOPS. Other elution regimes can be used; what is 
important is that the conditions of this optimization 
be sirailar to the conditions th.:t are used in Part III 
for selection for binding to target (Sec. 15.3) and 
recovery of GPs from the chromatographic system (Sec. 
15.4). 



When the os p-inbd gene is regulated by [XI.^DUCH:], 
15 IPBD/C? can be controlled by varying [XINDuCE] . 
Appropriate values of C^^f^DUCEJ depend on the identity 
of EXINDUCE) and the promoter; if, for example, XINDUCE 
is isopropylthiogalactoside (IPTG) and the promoter is 
lacUVS . then (IPTC) = 0, 0.1 uM, 1.0 uM, 10.0 u«, lOC.O 
20 uM, and 1.0 mM would be appropriate levels to test. 
The range of variation of iXIMDUCE] is extended until 
an optimum is found or an acceptable level of 
expression is obtained. 
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DoAMoM is varied fron the maxit.ium that tlie matrix 
material can bind to 1< or 0.1% of this level in 
appropriate steps. We anticipate that the efficiency 
of separation will be a smooch function of DoW^o.^ so 
that it is appropriate to cover a wide range of values 
for . DoAMoM with a coarse grid and then explore the 
neighborhood of the approximate optimum with a finer 
grid . 

Several values of initial ionic strength are 
tested, such as l.O n.M, 5.0 ruM, 10. 0 mM and 20.0 r.^ . 
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LOW ionic sc^cnqc^. favors bin-iinq tetveen oppositely 
Charged groups, but could al.o cause CP to precipitace. 

The elucion rate is varied, by successive factors 
of 1/2. from the .naxinum attainable rate to 1/16 of 
this value. If the Louect elution rate tested gives 
the best separation, 

we find an optipu.-n or adequate scparotion. 



test lower elution races until 



The goal of the optimization is to obtain a sharp 
transition between bound and unbound CPs, triggered by 
increasing salt or decreasing pH or a combination of 
both. This cptinization need be perfon:ied only: a) for 
each tenperature to be used, b) for each pH to be used, 
and c) when a new C?(IPDDJ is created. 



Sec. 



'.-Hsurina the so nsitivvtv. 



affin ity 



seoa ra t ion : 

Once the v.Uues of TP3D/G?, DoA.^.oM, initial ionic 
strength, elution rate, and amount of C?/{volu-e of 
affinity support) have been optimized, ve deterrnine the 
sensitivity of the affinity separation (C^ensi) "^^^ 
following procedure that neasurcs the ninir.u.-n quantity 
of C?{I?BO) that can be detected in the presence of a 
large excess of wtCP- The user chooses a nur.ber of 
separation cycles, denoted Hchron^ ^^^^ '■'^^^ 
perfornod before an enrichrnent is abandoned; 

preferably. Nchron "^^^^ "^"^^ ^ ^° ^""^ ""^^^^^ 

must be greater than 4. Enrich.-nent can be terninatcd 
by isolation of a desired G?(SDD) before t^chcon P^^sses- 

The ncaiurcncnt of sensitivity is significantly 
expedited if GPdPBD) and wtGP carry difierent 
selectable r.arKers because such markers allow easy 
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identification of- colonieG obtained by plating 
fractions obtained from the chroma tog r-:*phy column. For 
example, if ■ utCP carries kanamycin resistance and 
CP(IPDO) carries ampicilLin resistance, ve . can plate 
5 fractions from a column on non-se.lective media suitable 
for the CP.- Transfer of colonies onto anpicillin- or 
kanarr.ycin-containing media will deterniine the identity 
of each colony. 

10 Mixtures of GP(IPBD) and wtGP are prepared in the 

ratios of l:V|ijn' "here Vj^^-, ranges by an appropriate 
factor f e .q . 1/10) over an appropriate range, typically 
10^^ through 10**. Large values of Vj^^^ are tested 
first; once a positive result is obtained for one value 

15 of ^lij^f snaller values of "^Hm ^^^^ tested. 

Each mixture is appJ.ied to a column supporting, ac the 
optimal DoAMoM, an AfM(IPDD) having high affinity for 
IPBD and the column is eluted by the specified elution 
regime, such as Elution Regirie 1. The last fraction 

20 that contains viable CPs and an inoculum of the column 
matrix material are cultured. If GP(IP50) and wtCP 
have different selectable markers, then transfer onto 
selection, plates identifies each colony. If GP(.tPBD) 
and WtCP have no selectable markers or the came 

25 selectable markers, then a number (e.g. 32) of CP 
clonal isolates are tested for presence of IPSO by the 
techniques discussed in Sec. 3. If IPSO is not 
Detected on the surface of any of the isolated CPs, 
then CPs are pooled from: a) the last feu (o.g. 3 to 5) 

30 fractions that contain viable CPs, and b) an inoculum 
taken from the column matrix. The pooled CPs are 
cultured and passed over the same column and enriched 
for CP(IPBO) in the manner described. This process is 
repeated until N^hrom P'lsses have been performed, or 

35 until the IPBD has been decectcd on the CPs. If 
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Cl-flP3D) is no;: detected alter Nchrom P-i^ses, 
decreased and t^.e prccccs is repeated. 



• once a value for found that allows 

5 recovery of C?{I?BD)s, the factor by which 

varied is reducod and additional values are -tested 
ur.ti] V^i-, is kn:jv;n to within a factor of two. 

Csensi equals the hiqhest value of V^^,^ for which 
M the user can recover CP(IPeD) within Nchrom Passes. 
The nur.ber of chromatographic cycles (Kcyc) that were 
needed to i.<3olate CPCPBO) gives a rough estimate of 
Ceff Ceff is approxir.atGly the K^ycth root of Vlirr.: 
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C^j-f = (approx:.) exp( logg ( im) /^cyc ) 

For ■ey.ar.ple, if v^i^ were 4.0 x 10^ and three 
separation cycles wore needed to isolate GP(IP50), then 
Ceff = Capprox.) 736*. 

c:p<-. 10. Mee.su ri na r.he effic ie ncy of geoa r-i t joiLJ. 

To determine C^f^ nore accurately, we deter:nine 
tne ratio of G? ( I FBD] /wtCP loaded onto an Ar.MdPBD) 
column that yields approx i-Tiately equal amounts of 
G?(IPBD) and wtG? after clution. We prepare .r.ixturca 
of GP{IPBO) and wtCP in ratios C? ( I P8D) : wtC? :: 1:Q; we 
start Q at twenty tir.es the approximate Ccff found in 
Sec. 10.2. A l:Q nixture of GF(IPBD) and vtGP is 
applied to a AfMflPBD) colurin and eluted by the 
spccifi^id eluticn regir.e, such as Elutivjn Regime 1. A 
sample of the last fraction that contains viable CPs is 
plated at a dilution that gives well separated colonics 
or plac,-jes. The presence of IPBD or the csp-ipbd gene 
in each colony or plaque can be detornined by a nurr.ber 
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cf standard ir.echods, including: a) use of different 
selectable* markers, b) nitrocellulose filter lift of 
CPs And detection with AfM(IP3D)* (AUSU37), or c) 
nitrocellulose filter lift of CPs and detection uith 
radiolabeled DUA that is cor.plementary to the osp- icbd 
gene (AUStJ37). Let F be the fraction of Ct-'(:P3D) 
colonies found in the last fraction containing viable 
CPs. When a Q is found such that .20 < F < .60, then 



-ef f 



If F < 0.2, then ve reduce Q by an appropriate factor 
( e.g.- I/IO) and repeat the procedure. If F > 0.8, then 
we increase Q by an appropriate factor f e .g . 2) and 
repeat the procedure. 
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Other Sepr^r/ition y.c ptns 
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Other separation r.eans are optinized in a manner 
LO parallel to the used for affinity chrona tography . 



FACS is likely to be most appropriate for 
bacterial cells and spcros because the sensitivity of 
the machines requires approx ir.a te ly 1000 molecules of 
f iuorescc-.t label bound to each CP to accor.plish a 
separation. An appropriate con.tiercial FACS machine is 
a FACStar from Bcckton-Dick inson , Mountain View, CA. 
To optimize FACS separation of CPs, -e use a derivative 
of Afm(IPDO)A that is labeled with a fluorescent 
molecule, denoted Afm (IPBD)*. The variables that must 
be optimized include: a) amount of IPBD/GP, b) 
concentration of Afm(IPDO)*, c) ionic strength, d) 
concentration of CPs, and e) parameters pertaining to 
operation of the FACS machine. Because Afm(rPBD)* and 
CPs interact in solution, the binding will be linear in 
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both tAfn(IP3D)»i and (dispUyed rr-QU). Preferably, 
these tvo parameters are varied toqathor- The other 
parar^etcrs can be optinired independently. The 
sensitivity and. efficiency of the FACG separation are 
.5 detensined Jn, a., manner parallel . to.,.-, those , used for 
chromatography. 

ElGCtrophores ic is. nest appropriate to 
bactcriophaqe because of their snail sl::e. Server- 

10 (SERW37) has reviewed use of agarcse-gel 
electrophoresis to separate phage based on charge. 
Electrophoresis is a preferred sftparaticn neans if the 
target is so snail that chcnicaUy attaching it to a 
colu-n or to a fluorescent label would essentially 

15 change the entire target. For exanpie, chioroaceta te 
ions contain only .seven atons and would be essentially 
altered by any linkage. CPs that bind ch Icrcacetato 
would becon;e more negatively charged than CPs that do 
not bind the ion and. so these classes of CPs could be 

20 separated. 

The "parameters to optinize for electrophoresis 
include: a) IPBO/GP, b) concentration of gel material, 
e.g. agarose, c) concentration of Afn flPED}, d) ionic 
25 strength, e) size, shape, and cooling capacicy of the 
electrophoresis apparatus, f) voltages and currents, 
and f) concentration of CPs. Preferably, IPBD/GP ar.d 
[Afn(IPBD)} are varied ac the sane tir.c and other 
parameters are optimized independently. 
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In Part II vc have dctcrni'.-d opti.T.al conditicns 
fur separating CPs based on proteins displayed on the 
CP surface. We have also dcterr.ined the capabilities 
of the affinity separation system. Knowledge of these 
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1) the coIgcuIos cf the "arqoc nat'jri.il nust tc of 
r,'jf f icient size and chenicrtl reactivity to to 
cor.juqated tc a suitable fluorescent dye or the 
tarqet aust itr.olf be fluorescent, 

2) after any nocessary fluorescent label inq, tr.e 
tiirqet aust net rcoct --/ith wtiter, 

3) after any ncccc^ary tluorescent lateling, the 
tarqet .T.3terial nust no?: bind or deorade proteinr. 
in a non-specific way, and 

4} the noleculos of the t.^.rget r.aterial nust be 
sufficiently large that attaching the rv.^terial to 
a suitable dye allows enough unaltered surfarie 
area (generally at least 500 excluding th.e 

aton that iz connected to the linKcr) for protein 
binding. 

If affinity e lec t rophorec is is to be u.=3ed, then: 

1) the target r:usc either be charred or of such a 
nature that its bindinf; to a protein vill chan:;e 
the charge of the protein, 

2) the target ;:iatcrial must net re-Tct with wjtor, 

3) the t.Trget n-nteri-il r.ust net bind or degrade 
proteins in a non-specific way, and 

^) tno target r.ust be cor.patible with a suitatlo 
gel n.itcr ial . 



Possible target natorinl:; inclu:ie, but are not 
35 linited to: 
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1, norsc h.arc r.yoglcbln 

::i":r:i, ..^.au ....^ 

5) asbestos 

6) .Upha-£otoprocoin 

7) ra5 proccins 

3, densUy '-ipcprcco:n 

,) prostaqUndln FCE2 
10) alpha intorfcron 

U) melittin ,,,.,v/Ute cyclase toxin 

12) Eoidste.m Ei£-;-^ 

13) aflatoxin B; 

14) aspartame 

15) haen 

16) bilirubin 

17 ) r.orph ine 

.,SJ ccdeine ^ ^ . ^ ,,,,, .ethane (DDT) - 

20) benzoCa)pyrcne 
2L) actinomycin D 

any rct.ovi.al aal P — - . 

any retroviral 

B El^rt^^ aq^lutir.oqen- 

27) CLbril or .^^^ species, 
several spirochete or 
ocganisas causing syph.U- t-y... 

relapsing fever 

_ cntcrotoxin protein 

28) £2U c""^ hor.olysin 
JO) EseiidaESiiaS nnoiain.:^ 
31) zeolites 



32) ccLlul 
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The user of Che present invention specifics 
certain parameters of the intended use of the binding 
protein: 

1) the acceptable terr.pcra ture rar.qc, 

2) the acceptable pH ranqe, 

3) the acceptable concentrations of ions and 
neutral solutes, 

4) the nax-imun acceptable dissociation constant 
for the target and the SBO: 

K-r = (TargctJ [SBD]/[Target:SBOj 



In so-c cases, the u-er nay req-uire discrimination 
20 bctveen T, the target, and N, seme non-tarcet. Let 

- ;Ti[SSD]/[T:SBDl , and 
= [S3D]/[tf:SBD] , 

*25 then K^/Kn = ( [T H " : SBO 1 ) / ( [ N ] [ T : S 30 1 ) . 

The user thpn specifics a naxinun acceptable value for 
the ratio K-j-ZK^. 

30 The target r.ateriaL r.usc be stable under the 

specified conditions of pH, temperature, and solution 
cond it ion'j . 

If the target material is a protease, one must 
35 consider the following points: 
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i; a hivhly specific proccosa can treated iiv-.o 
any othur target, 

2) a cjcnoral proteose, such as subtiiisin, nay 
degracJc the QSPs of the CP including onp-?BDs; 
there' are several alternative vays of dealing vith 
general protca::ca. inclu'Jir.g: a) ^ cho-.-ical 
inhibitor r.ay be used to prevent prccjolysis fcj^ 
phenyir.ethyl f luorcsul f ate (PliFS) ihat inhibits 
serine proteases) , b) one or moro active-site 
residues nay be mutated to create an inactive 
protein f e . g . a serine protease in vhich the 
active serine is r.utated to alanine) , or c) one or 
nore active-site a.-.ino-acids of the protein nay be 
chemically ncdificd to destroy the catalyt.^c 
activity ( e.g. a f.crine protease in vnich the 
active serine is converted to anhyJ rcscr ine ) , 

3) snOs selected for binding to .a protease need 
not be inhibitors; SDDs that happen to inhibit 
the protease target arc ^ fairly s.-nall subset of. 
SBDs tJiat bind to the protease targr:t, 

4) the r.cre.vc ncdify the target protease, the 
less like we arc to obtain an S3D that inhibits 
the target protease, and 

5) if the user requires that the SHD inhibit the 
target protease, then the active cite of the 
target protease nust not be nodifici any r.orc than 
necessary; inactivaticn by nutation or chenical 
modification arc preferred nethods of inactivation 
and a protein protease inhibitor tucor.es a prir.e 
candidate for I PHD. Tor oxanple, iii-Tl couid bo 
nutatod, by the nethods of the present invention. 
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t-o bind to proCCASc-j ochor than trypsin (TANX?^ 
and TSCH87) . 

qno, 17.0: Choi CO of CP ft POO) L 

The user must p\c>: a GP(IPBO) that is suitable to 
the chosen target acrorimg to the criteria of Sec. 2- 
It is anticipated thac a small collection of a 
CP(irCD)s can be asccr.bLod such that, for any chosen 
target, at least one -e-i:or of the collection viU be a 
suitable starting poinz fcr engineering a protein that 
binds to the chosen target by the methcds of the 
present invention. 

If the pH, temperature, or other pararneters of the 
intended use of the selected SDD differ markedly from 
the conditions used to cptinize tl;e affinity scp.^ration 
for the chosen GP(IP5C), then the user should cptinice 
the affinity separation for conditions appropriate to 
the intended use by the -ethods described in Part II. 

Pre. 13. Tdcnti f if^-^^-^n of F^r,\\v of PSDs. Re lated 
to PPDD. to Be Conor-ited 



Sec. 13.1: Chonp in q ro<= i.i'.iL-s on IPBD fo r otl'.cr PPgOl 
to vary: 

We choctic residues in the IP3D to vary throug.^ 
consideration of several factors, including: a) the 3D 
struL-ture of the IPBO, b) sequences homologous tc I?3D, 
and c) nodoling of the IPDO and nutants of the IPSO. 
Because the nur.bcr o: residues that could strongly 
influence binding is alvays greater than t^.e number 
that can be varied s inu 1 tanoous ly , the user must pick a 
bubsct of those residues to vary at one tir»e. The user 
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nicH trial lovcls ot variogat.on 
nusC also picV. cru- 

. ^,o-r.c= of various soquences. The 
calculate the abundor.ccs oc 
calculate variegation at 

list of varied residues and the Level o ^ 

varied residue are adjusted until the co.pos..c 
S variegation is co.nensuratc with C^ensi «ntv 

wc no. consider the principles that guide our 
choice of residues o. the IPaO to vary. -fV. concept 
I that only structured proteins c.hi.it spec 

^inc ^ e c.n tind to particular chemical en.../ 
10 binding, ^ can residues to 

to the exclusion of r.osc other.- 

•.H n,n eve to preserving the 
be varied are chosen with an eye c w 

, • r-'RD .'^ucture. Substitutions that prevent 
underlVinq IrBD S.ruc.utre. 

e PBO fr=. to^.in, -lU cause CPs c.rryin, .^ose 
,S .eaes .o .Ind indUc.inin.coly so t.at they =.n c.sw, 
be removed trora the pcp'jlicion. 

Buri.l ot hydrophobic surfaces so that bulk -ator 
excluded is one ot the strongest forces dr.v.n, h« 
hinding of proteins to other nolecuics. CuV. -.a.er c • 
I o Judcd fro. the re-on .ct.een t.o „oiocuics on 

.he surface, are c=.ple.r.cncary . W-nust tcs. a. 
.Iny surfaces as possible to find one that .s 
Lple^entary to . the tar.ct. The seiect.on-throu. - 
binding isolates tho.e proteins that are .ore nea y 
==.ple.entarv to soc.e surface on the "^Y" s 
effective diversity of . variegated popuUt.on s 
measured by the nu.b.r of different surfaces, rath 

the number of prcte.n sequences. -- - --- 

• • rro number cf surfaces generated m oj. 
naximize tne nunoer ^^nroi-i 
.y^.r- r-,in '-he number of protein 
population, rather tn-in -nt 

sequences . 

. I we consider 

In hyoothotical e\a..pie i. 

:« wMc-are ^ bindinn to a 
35 hypothetical PBD, shovn in Hqure 
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hypothetical t3rs;et. Figure 4 is a 2D schematic of 3D 
objects; by hypothesis, rosiduos 1, 2, 4, 6, 7, 13. li , 
15, 20, 21, 22, 27, 29, 31, 33, 34, 36, 37, 38, and 39 
of the IPSO are on the 3D surface of the IPSO, even 
5> though sho'-n well inside the circle. Proteins do r.ot 
have distinct, countable foces. Therefore/je define an 
"interaction sec" to be a set oC residues such that all 
ner.bers of the set can sir.ultaneous ly touch one 
molecule of che cargct raaterial without any atora of the 

10 target coning closer than van der Waals distance to any 
main-chain atori of the IPSO. Th:S concept of a residue 
"touching" a -oLeculc cf the target is discussed belov. 
One hypothetical interaction set, Set A, in Figure s 
comprises -osidues 6, 7, 20, 21, 22, 33, and 3 4,, 

15 represented by squares. Another hypothetical 

interaction set, Set B, co.Tjprises residues 1, 2, 4, 6, 
31, 37, and 39, represented by circles. 

If ve vary one residue, nur.ter 21 for exanple, 
2C through all tvf-nty arr.ino acids, ve obtain 20 prst_-:n 
sequences and 20 different surfaces for interaction set 
A. NoT:e that residue 6 is in two interaction sets and 
variation of residue 6 through all 20 amino acids 
yields 20 versions of interaction set A and 20 versions 
25 of interaction set D. 

Now consiiior varying two residues, each through 
all twenty ar.ino acids, generating 400 prctcin 
sequences. If the tvo residues varied vere, for 

30 exar.ple, nur.bcr 1 and nu.-ber 21, then there would be 
only 40 different surfaces because interaction set A 
does not depend on residue 1 and interactiori set D docs 
not depend on residue 21. If the two residues varied, 
however, were nu.T.bcr 7 and nurr.ber 21, f-^n 4 00 srr faces 

35 would be generated. 
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If N sp»icialLy soparaccd residues arc varied at 
one time, 20 x U isurfaces arc qenciated. Variation of 
K residues in tlie same interaction set yields 20'* 
surfaces. Fcr exar.ple, if tt = 7, variation of 
separated residues yicLdi; 1-0 surfaces while variation 
of interactinq residues yields 2o'^ ^ 6.;" x 10*^ 
surfaces. Thus, to -ax:i.-:ii;e tne number nf surfaces 
qoneratcd vhon S residues are varied, all residues 
should be in the same interaction set because variation 
of several residues in one interaction set generates an 
exponential nu.T.ber of surfaces vhile variation of 
spatially separated surface residues generates only a 
1 inea r nur.be r . 
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The amount of surface area 
protein-protein interactions ranqes fron 1000 \^ to 
2000 A^, .^s sur-.arited by Schul: and Schirrer (GCHb'79. 
pl03ff). Individual a.T.ino acids have total sui race 
areas that depend r.ostly cn typo of anir.o acid and 
•-oaKiy on ccnf orr.at :on . These areas rar.ge fror. about 
ISO for olycine to about .3 GO A^ for tryptophan. 

Averages, of total surface area by amino acid type and 
naximum exposed surface area of each anino acid type 
^fur tvo typical pro;:Gins. hon e^g vhite lysozync (KEWL) 
and T4 lysozyne (T4L), are shewn in Table 6. Frc.- 
these exposures, one can calculate that ICOO A- cn a 
protein surface, ccr.pnscs bot-een and 30 ar.ino acids, 
depending on the amino acid types and the protein 3D 
structure. Varied ar.ino -:icid sequences, as found in 
actual proteins, involve hofjoen 10 and 25 residues in 
forcing 1000 A- of prctein surface. 5chuLz and 
Gchir.:ier cstir.ate that ICO A- of protein surface can 
exhibit as nany as ;.-00 different specific patterns 
(SCI(U70, plOSj . The nu.-.ber of surface patterns rises 
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exponentially with the area that c^n He varied t 
independently. One of the BPTI structures recorded in 
the DrooVJiaven Protein Data Bank {6?Tr), for example, 
has a total exposed surface area of 3997 (using the 
method of Lee and Richards {LEE371) and a solvent 
radius of "1.4 A and atonic radii as shown in Table 7). 
If ue could vary this surface freely and if 100 can 
produce 1000 patterns, wc could' construct lO^^O CT^S; 
different patterns by varying the surface of BPTI!, fl-i-iS 
This CAlculation is intended only to suggest the huge 

nunber of possible surface patterns based on a coi.imon ^'V*?^;^" 
protein bacJcbone. f"- -'T*'' 

One protein franevcrk cannot, however, display all 
possible patterns over any one particular 100 A^ of 

surface merely by replacement of the side groups of ^-.vj'v 
surface residues. The protein backbone holds the r.'".'../* 
varied side groups in approximately constant locations E"/'---!^ 
so that the variations are net independent. We can, 
nevertheless, generate a vast collection o: different 
protein surfaces by varying those protein residue's that e 
face the "outside of the protein. 

Figure 5 shows 3?TI in contact with nyoglobin, p\ 
-From this ve can sec that residues 3, 7, 6, 10, 13, 39, 
41, and 42 can all s ir.u I tancous ly contact a molecule 1^.. 
the size and shape of .-yoglobin. Figure 5 also shows ^- ■ 

that residue 49 can not touch a single nyoglobin Y 
molecule s ir.ul tancously with any of the first set even 
though all are on the surface of BPTI. It is not the 
intent of the present invention, however, to use -odels 
to determine which part of the target molecule will 
actuaJly be the site of binding by PBD. 
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If cassccce nu*:aqencsis is picked, the prctcin 
residues to be varied are, preferably, close enough 
together in sequence that the variegated DUk (vgDNA) 
encoding all of thcra can be niade in one piece. The 
present invention is not limited to a particular length 
of vgDNA that can be synthesized. With current 
technology, a stretch of 60 amino acids {180 DUk bases) 
can be spanned. 

Further, vhen there is reason to mutate residues 
further than sixty residues apart, one can use other 
autational r.eans, such as s i ng I e - s t r a n d ed - 
oligonucleotide-riirfc-ted nutagenesis (DOTSaS) using tvo 
or more mucating primers. 

Alternatively, to. vary residues separated by r.cre 
than -sixty residues, tvo cassettes may be nutated as 
follows: 

1) vg D::a having a i ov level of variegation (for 
example, 20 to 400 fold variegation) is introduced 
into one cassette in the OC/, 

2) cells are traiisformod and cultured, 

3) vg OCV DMA is Obtained, 

4) a second sog-ent of vgD.'.'A is inserted into a 
second cassette in the OCV, and 

?)) cells are transforr.cd and cultured, CPs are 
harvested and subjected to seleccion-through- 
binding. 
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The co^posi.e level oC variation nusc not exceed t.:e 
prevailing cap.biUuies to a) produce very Urge 
nuabers of independently transforT=ed cells or b) detect 
snail components in a highly varied population. Tr.e 
Units on the level of variegation are discussed in 
Sec. 13.2. 

l-ere ve asccr.ble the data about the IPED and t-.e 
target that are useful in deciding which residues to 
vary in the variegation cycle: 

1) 3D structure, or at least a list of residues on 
the surface of the IPSO, 

2) list of sequences hor-.oloqous to IPDD. and 

3) nodel of the target nolccule or a star.d-in for 
the ca.'oet. 



These data and an understanding of the behavior of 
different anino acid= in proteins will be used to 
answer two questions: 

1) which residues C the IPBD are on the outside 
and close enough together in space to touch the 

targe- oinulcanoo'isly? 

2) Which residues of the IPDO can be varied with 
high probability of retaining the underlying I?CiO 

Structure? 

Alt.hough an atonic r.odel of the target .-.aterial 
(Obtained through /.-ray crystal Icgraphy, NMR, or other 
rveans) is preferred in such exanination, it is not 
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necessary. For e.'or.pLe, if the c.rqcc were . protein 
of UMknovn 30 structure, it •-ould be sufficient to know 
the molecular wcicht of the protein and -whether it were 
a soluble globular protein, a fibrous protein, or a 
mer.brane protein. Physical neasuren^ents , such as lov- 
angle neutron diffraction, can detemino. the overall 
molecular chape. viz- ratios ot the principal 

ncr^onts ot inertia. One can then choose a protein or 
known structure o: the sar.e class and similar size and ■ 
shape to use as a .-.olecuK^r stand-in and yardstick. It 
is not essential to Dcasure the nonents of inertia of 
the target beca-- at lev resolution, all proteins of 
a given size ar.i class look much the sa^.e. The 
specific volumes are the sar.e. all are r.ore or less 
soherical and therefore .11 proteins of the sar.e siz*. 
and class have about th^ sar.e radius of cur^/ature. The 
radii of cur^/ature of the v-'o .-nolecules doter:7.ine hou 
nuch of the two r.olccules can cone into contact. 



Several graphical and cor.pu tat iona I tools that are 
needed or useful. The r.cst appropriate r.ethod of 
picking the residues of the protein chain at vhich the 
anino acids should be varied is by viovinq. uith 
interactive ccrputar graphics, a node! of the IPSO. A 
25 stick-figure representation of r.olecules is preferred. 
K suitable set of har.v.-are is an Evans & iiutherland 
PS3 90 graph=.cs terninal (Evans ^ Sutherland 
corporation. Salt Lake City. UT) and a MicroVAX 11 
supermicro corputer (Digi-:al Equipzient Corp.. .-aynard. 
30 .v/v). The conputor should, preferably, have at least 
150 megabytes of disk storage, so that tho Drockhaven 
Protein Data Cank can be kept on line. A rcP.T?.A:i 
co-.?iler, or sc.v.e equally good highcr-levcL language 
processor is preferred for prograr development. 
35 Suitable pi-ograr.s for vicving and rran i';u lat i.ng protein 
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models include: a) written by T. A. Jones 

(JONE35) and distributc'J by the Siochemistry Department 
of Rice University, Houston, TX: and b) PROTEUS, 
developed by Dayringer. Tramantano, and Flctterick 
5 (DAYR36). I::;portant features of PS-FRODO and PROTEUS 
that are needed to vie*-/ and manipulate protein- models 
for the purposes of t^o present invention are the 
abilities to: 1) display molecular stick figures of 
proteins and other r.oLcculcs, 2) zoom and clip images 

10 in real tine, 3) prepare various abstract 
representations of the molecules, such as a line 
joining Caj^p^aS and si:ic group atoms, 4) compute and 
display solvent-access i t; 1 o surfaces reasonably quicl;ly, 
5) point to and identify otomu, and 6) measure distance 

15 between atoms. 

In addition, one could use theoretical 
calculations, such as dynamic simulations of proteins, 
to estimate whether a substitution at a particular 
20 residue of a particular ammo-acid type might produce a 
protein of approximately the same 3D structure as the 
parent protein. Such calculations night also indicate 
whether a particular substitution will greatly affect 
,the flexibility of the protein; calculations of this 
25 sort may be useful but arc not required. 

Sec. 13.1 . 1 : The o r i nc i I sot: 

In this section ve pick a principal sot of 
30 residues of the Il'tiO to vary. Using the knowledge of 
which residues are on the surface of the IPBO (as noted 
above), wo pick residues that arc close enough together 
on the surface of the IPDD to touch a molecule of tlie 
target simultaneously without having any IP3D main- 
35 chain atom come closer than van der Waals distance 
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( viz. 4.0 CO 5.0 A) fron any target atom. For the 
purposes of t:ie prcccnt invention, a residue of the 
IPBD "touches" the target if: a) a main-chain aconi is 
within van tier Waals dist.'ince, v\z. 4.0 to 3.0 A of any 
atom of the target nolecuLe, or b) the C^eca within 
^cutoff °' ^^^'^ °' target molecule so that a 

side-group uto.-n could r.aKo contact with that- atoni. 
Because sido groups differ in sir.o (ct^ Table 35), some 
judg:nonc is required in picking D^utoff- 
preferred enbodir.enc, ve viH use D^utoff = ^' 
other values in the range 6.0 A to 10.0 A could be 
used. If rPBD has C at a residue, wc cons-ruct a 
pseudc Cijcta '-^^^^ correct bond distance and angles 

and judge the ability of the residue to touch the 
target fron this pseudo Ci^^ta- 

Alternatively, wo choose a set of residues cn the 
surface of the I P3D such that the curvature of the 
surface dcfinoJ by- tfie residues in the set is not «;o 
great that it. would prevent contact between all 
residues in the set and a nolecule of the target. This 
method is appropriate if the target is a macrono Iccu I e , 
i;uch as. a prctc-in, because the PBCs derived fron the 
IPDO will contact only a par'c of the nacronclccul ar 
^surface. The surfaces cf nacronolecu les are irregular 
with varying cur.'atures. If wc picK residues that 
define a surf^.co that is not too convex, then there 
wilt bo a rf;gion on a nacromoleculer target vich a 
cor.p:itiblc cur^/ature. 

In addition to the gcor.ctrica I criteria, wg prefer 
that there bo so.r.c indic/icion that the underlying IPBD 
structure will tolerate substitutions at each residue 
in the principal set of residues. Indications could 
come from various sources, including: a) hor.ologous 
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sequences, b) static cor.pcter irodolinT, or c) riyr.amic 
computer simulations. 

The residues in the principal set need not be 
contiguous in the protein sequence. The exposed 
surfaces of the residues to be varied do not need to be 
connected. We require cnLy that the amino acids in the 
residues to bo varied all be capable of touching a 
molecule of the targ»it material simultaneously without 
having atoms overlap. If the target were, for example, 
horse heart m/oqlobin, <snd if the IPBD were BPTI, any 
set of residues in one interaction set of BPTI defined 
in Table 34 could bo pic/:ed. 

Preferably, the principal set contains eight tc 
sixteen residues. This number of residues allows 
sufficient variability that a- surface that is 
co.-nplemcntary to the target can be found, but is snail 
enough that a s igni r ic.-^nt fraction of the surface can 
be varied at one tir.e. 
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L3. 1.2 : Vxe_Z(ico:vi^r{ set: 

The secondary sot ccr.prises thoso residues not in 
^the primary set that tcuch residues in the primary sot. 
These residues night bo excluded from the primary set 
because: a) the residue is internal, b) the residue is 
highly conserved, or c) t.^e residue is on the surface, 
but the curvature of the IPDD surface prevents the 
residue from being in contact with the target at the 
sa.Tie time as one or mere residues in the primary set. . 

Internal residues ar*; frequently conserved and the 
amino acid type c^^n not be changed co a significantly 
different type without substanti^il risk that the 
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protein straccurG will be disrupted. Nevertheless, 
some conservative changes of internal residues, i^uch as 
r to L or F to are tolerated. Such conser/at ive 

change, affect the detail placement and dynamics of 
adjacent protein residues and such variation nay be 
useful once an S3D is found. 

surface residues in the secondary set arc -ost 
often located on the periphery of the princip.U set. 
such peripheral residues can not make direct cor.tact 
with the target sinu Itaneously with all the ether 
residues of the principal sot. The charge on the aairo 
acid in one oC these residues could, hc^ever, have a 
strong effect on binding- Cnce an SBD is found, it is 
appropriate to vary the charge of soneor all of these 
residues. For example, the variegated codon containing 
equinolar A and G at base 1, equimolar C and A at base 
2, and A at base 3 yields a.riino acids T, A, and E 

with equal probability. 



Sec. 13_J 



r.r rpqiducs to varv initL^^lj-Zl 



Choice of residues in the primary and sero.nda ry 
set is based on: a) gccr.ctry of the IPBD a.nd the 
-geonetrical relationship between the 1?3D und the 
target (cr a stand-in for the tai-gct) in a hypothetical 
complex, and ta) sequences of proteins hor.olcgous to the 
IPBD. In this section we pick a subset of the residues 
in the prinary and secondary sets, based on goorctry 
and on the r.axir.ua allowed level of variegation that 
assures p rog res s i v i ty . The aUowed levol of 

variegation doter-Tiincs how nany residues can he varied 
at once; qconetcy dctornines vhich ones. 
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The user nay pick residues to vary in many '-Myr.; 
the following is a preferred m.-vnncr. Pairs of residues 
are picked that are diar.etrically opposed across che 
face of the principal set. Two such pairs are used to 
5 delimit the surface, up/down and right/left. 
Alternatively, three residues that forrrj an inscribed 
triangle, having as large an area as possible, on the 
surface are picked. One to three other residues are 
picked in a checkerboard fashion across the interaction 
10 surface. Choice of widely spaced residues to vary 
creates the possibility for high specificity because 
all the intervening residues must have acceptable 
coDplenentarity before favorable interactions can occur 
at widely-separated residues. 

15 

The number of residues picked is coupled to the 
range through which each can be varied by t!ie 
restriction's discussed in Sec. 13-2. In tne first 
round, we do not assur.e any binding between IPBD and 

20 the target and so progress iv i ty is not an issue.. At 
the first round, the user may elect to proctucc a level 
of variegation such that each r.olecule of vgDNA is 
potentially different through, for example, unlinircd 
variegation of 10 codons (20^° approx. ^ 10^^). One 

25 jrun of the Dr;A synthesizer produ;:es approximately 10^^ 
molecules of length 100 nts. Inefficiencies in 
ligation and transformation will reduce the number of 
proteins actually tested to between io'' and 5 x 10°. 
Multiple replications of the process with such very 

30 high levels of variegation will not yield rcpeatable 
results; the u-jer rust decide whether this is 
important . 
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Having pickcJ which residues to var^', ve aust now 
decide the range of amino acids Co alLou at each 
variable residue. The total level of variegation is 
the product of the nunber of variants at each varied 
residue. Each varied residue can have a different 
scheme of variegation, producing 2 to 20 different 
possibilities. We require that the process be 
progressive, i.e. each varieg*itian cycle, produces a 
better starting point for the next variegation cycle 
than the previous cycle produced. 



3 . : Setting the level of variegation such 
that the oabd and many sequences related to 

15 the cpbd sequence are present in detectable 

anounts insures that the process is 
progressive. If the level of variegation is 
so high that the pobd serpjence iii present at 
such low levels that there is an appreciable 

20 chance that r.o trans foment vill clisplay the 

F?*fS, then the best SBC of the next round 
could be worse than the P?30. At ^-xcessively 
high level of variegation, each round of 
cutagenesis is independent of previous rounds 

2 5 and there is r.o assurance of p regress ivity. 

This approach c:in lead to valuable binding 
proteins, but repetition of experi-onts with 
this level of variegation will not yield 
progressive results. Excessive vari-:ition is 

'30 not preferred. 



HypocheticaL exa.-ple 2 considers the effects of 
the level of variegation on the progress ivity of the 
process of the present invention. Figure 6 is a 
35 schematic view of a hypothetical eight-residue binding 
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surface of a POD comprising residues 11, 2^, 25, 30, 
34. 42, 44, and 47 of a hypochetical protein. Each 
polygon represents the exposed portion of one residue. 
By hypothesis, there exists at least one protein, shovn 
5 in Figure 6e. having a specific aaino acid in each of 
the eight residues that will bind to the target, but we 
do not, at first, know what that sequence is. 

The IPBD. shown in Figure 63, nay have none of the 
10 opti.-nal acino acids on its surface. Because wa begin 
with no information, our initial estimate is that all 
amino acids have equal likelihood of being the best at 
each of the eight residues. 

15 Dy hypothesis, the genetic engineering system of 

hypothetical exar.ple 2 has Mntv=^0^ 
selftct ion-through-binding system has Cj^gnsi ^ ' 
Also by hypothesis, the variegation method can produce 
all anir.o acids at a given residue with equal 

20 probability. 

In the first variegation, we vary residues 11, 24, 
25, 34. and 44 through all twenty anino acids, 
producing 20^ = 3.2 x 10^ sequences. The capabilities 

25 of the gp.netic engineering system allows all these 
sequences to be present in the selection step and the 
selection system can detect 1 C? in 10 . By 
Hypothesis, wc isolate a CP carrying an sbd gene that 
encodes the first SCD, shown in Figure 6b. that has 

30 improved binding for the target and has the amino acid 
sequence WU-r24 -£25-030-03 4 -E4 2-P44-T4 7 . T^is anino 
acid sequence becomes the parental sequence to the next 
variegation. After the first variegation and 
selection, the evidence favors wil, F24, E25. D34, and 

35 P44 as optimal ar.ino acids at their respective 
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residues. That residues 30, ';2, and -17 were noc varied 
has two ir.pl icat ions: 

1) ve still have no information about which amino 
acid is optimal at these residues, and 

2) the amino acids selected at the varied residues 
are optimal, qivon the identities of the amino 
acids in the non-varied residues; when residues 
30, s2, and 47 are varied, our estimate of the 
optimal amino acids in other residues nay change. 

Now consider two versions of a variegation that 
take the first intermediate S3D as parent and that 
might get us closer to the optimal SBD. 

In the first version of the second variegation, we 
vary only five residues, producing 3.2 x 10° sequences, 
all of which are e-xpressert and subjected to selection- 
through-binding. We vary residues 30, 42, and s.1 
because thoy were not varied previously. We also vary 
two other residues so that as many surfaces as possible 
arc tested; residues 2-* and 44 arc chosen. Suppose 
that we isolate a C? that carries an sbd gene encodi.-.g 
the amino acid" sequence ' W11-L24 -E2 5-1 30-D34 -R4 2 -?4 4 - 
'K47, shown in Figure 6c. Consider the reason that D is 
retained at residue 34. Ke know that all the sequences 
W11-L24-E25-I30-X34-R42-P44-K47 (where x runs through 
all twenty amino acids) were tested and therefore can 
conclude with improved confidence that D34 is optir.al, 
given the rest of the selected sequence. Now consider 
the change at residue 24 from P to L. We know that all 
the sequences Wli-x2^.-E2 5-I30-O3 4-R4 2-P4 4-K47 vere 
tested and we can conclude that L24 is optimal, given 
the rest of the sequence.. At each of the varied 
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residues, .we gain inforrnation about which anino acids 
are optinal at each varied residue under the conditions 
imposed . 

5 In the second version, we will vary residues 11, 

24, 30, 34 , 42 , and 47, each through all twenty amino 
acids, producing 20*^ « 6.4 x lo'' possible different 
sequences. Our hypothesis is chat only 1:0 x lo'^ of 
these sequences are produced and subjected to 

10 selection. Because only 15. 6^ of the proqranuaed 
sequences are actually subjected to selection, it is 
likely that the parental sequence, Wl 1-F24-E25-G3a-D34- 
E42-P44-T47, is not present in the selection step and 
there is, consequently, no assurance that the best SSO 

15 binds more tightly to target than did the parental PBD. 
Suppose that we isolate a GP that carries an sbd gene 
.encoding the amino acid sequence VI 1-R24 -t:45-Q J0-D3 4 - 
R42-P44-D47, shown in Figure 6d . Consider the reason 
that 0 is retained at residue 34. Is it that D is 

20 optinal, or is it that, by chance, the sequence 
encoding the optical anino add, x, was not present as 
Vll-R24-£2S-Q30-x3^ -R4 2-P44-D47 in the sar.ple? We do 
not know and therefore can not conclude that D34 is 
optimal. Furthcrr.ore , retaining an anino acid can not 

25 ^ move us toward the optinal sequence. Now consider the 
change at residue 24 fron F to R. Was VI 1-R24 -E25-Q30- 
D34-R42-P44-D47 selected because R24 is opti-al in the 
presence of Vll- -E25-Q30-O34-R'i2-P44-047 , or was Vll- 
R24-E25-Q30-D34-R42-P44-D47 selected because V11-F24- 

30 E25-Q30-D34-R42-P44-047 was not present to be selected? 
Again, we do not know and can not conclude that R24 is 
an improvement, i.e. we can not conclude that R24 is 
more likely to be optimal than is F24. In both cases, 
wc lose information about which anino acids belong at 

35 each residue. Wc may have obtained an SOD with 
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superior binding to the target. Another variegation 
cycle at this level of varieyation, however, nay 
produce a better protein or a worse protein and the 
process is not progressive. 

Let us contrast versions 1 and 2 of the second 
varioqatio.. In version 1. ve retained, nore 
information, v^- that WU aUovs inprovcd binding, and 
therefore cur selection of K«7 incorporates the 
10 information obtained in the previous rounds. In 
versicn 2 of the second variegation, we discarded the 
infonnation that WH allows stronger binding than Yll. 

Proqressivity is not an all-or-nothi..g property. 

15 so long as nost of the information obtained fro.^ 
previous variegation cycles is retained and r.any 
different surfaces that are related to the PPBO surface 
are produced, the process is progressive. If the level 
of variegation is so high that the E£bd gene r.ay rot be 

20 detected, the assurance of proqressivity dioi.MShes. 
If the probability of recovering P.=>BD is negligible, 
then the probability of progressive bchavicr is also 
negligible. 

23 ' An ccposing force in our design considerations is 
that PBDs are useful in the population only up to the 
ar.ount that can be detected; any excess above tne 
detectable ar.ount is wasted. Thus -.e produce as oany 
surfaces related to PPBO as possible within the 

30 constraint that the PP3D bo detectable. 



we defer specification of exactly how nuch 
variegation is allowed until we have: a) specified real 
distributions for a variegated codcn, and b) 
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exarained the effects of d iscropanc ios between spccltiod 
nt distributions and actual nt distributions. 

Sec, 13 T npgion of vg HMA Encoding POD Fanilv: 

We raust now decide how to* distribute the 
variegation within the codons for the residues to be 
varied. These decisions nre int'iuenced by the nature 
of the 'qenetic code. When vgDNA is synthesized, 
variation at the first base of a codon creates a 
population containing amino acids from the sane column 
of the genetic code table (as shown in the Table 3-6 on 
p87 of WATS&7); variation at the second base of the 
codon creates a population containing amino acids fron 
the same row of the genetic code table; variation at 
the third base of the codon creates a population 
containing amino acids from the same box. If two or 
three bases in the sar.e codon are varied, the p.ittern 
is more complicated.' Work with 3D protein structural 
models may suggest definite sets of amino acids to 
substitute at a given residue, but the neth.od of 
variation may require either more or t"c*-.'cr kinds of 
amino acids be included. For example, examination of a 
model might suggest substitution of M or Q at a given 
residue. Co.-nbinator ia 1 variation of codons requires 
'that mixing » and Q at one location also include K and 
H as possibilities at the sane residue. One must 
choose to put: i) N only. 2) Q only, or 3) a nixture o: 
U, K, H, and Q. The present invention does not rely on 
accurate predictions of which amino acids should be 
placed at each residue, rather attention is focused on 
which residues should be varied. 
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There are many w,iys to generate diversity in a 
protein. (See RICHSS, CARUaS, and CLIP86.) One extrcrr.'; 
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case is th,^t one or a fev rc:3icjue3 of the protein are. 
varied as nuch as possible (intex see CAKU85, 

CARU87, RtCIl36, and WHARB6). We uiH call tMs linit 
"Focu:;ed Mutagenesis". Focused Mutagenesis is 

appropriate when the IPDD or other PPBD shows little or 
no" binding to the target, as at the beginning. of the 
search for a protein to bind to a new target r.aterial. 
When there is no binding botveen the PPBD and the 
target, ue preferably pick a set of five to seven 
residues on the surface and vary each through all 20 
possibil it ies . 

An alternative plan of r-utagenesis ("Diffuse 
■ Mutagenesis") that may be useful is to vary z>any nore 
residues through a nore limited set of choices (See 
Vershon et aK, Chl5 of INOU36 and PAKUa6). This can 
be accomplished by spi>:ing each of the pure nts 
activated for DN'A synthesis (e^ nt-phosphcran id ites ) 
with a s.-nall anount of one or rcre of the other 
activated nts. Contrary to general practice, the 
present invention sets the level of spiking so that 
only a sr.aU percentage ( 1% to .OOCOH, for exaople ) 
of the final product will contain the initial DMA 
sequence. This will insure that nany single, double, 
triple, and higher nutations occur, but that recovery 
of the basic sequence will be a possible outcone. Let 
be the nucber of bases to be varied, and let Q be 
the' fraction of all sequences that should have the 
parental sequence, then M. the fraction of the nixture 
that is the majority co^^ponent, is 

M = exp( log^(Q)/t:5 1 = 10 C log^o (Q) /"b) . 



If, fcr exarvplo, thirty base ^aitz on the ONA 
chain were to be varied and l\ of the product is to 
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sucn as V to' I. as well as aocc r.oz so suhzic chancjcG 

that require conconlinnc char.qcs it two or rr.oro 
residues of the protein. 

For Foc'jsed Mutagenesis, ve ncv consider the 
distribution of nts that will be ins-rtr.d at each 
variegated codon. Each codon could be proqr^rr.ned 
differently. If we have no infcr-dtion indicating that 
a particular ar'.ino acid or clctss of a-ir.o acid is 
appropriate, '-e strive to substitute aU ^r.ino acids 
with equ.-il probability because representation of one 
pbd above the detectable level is wasteful. Eq^jal 
ar.ounts of all four nts at each position in a codon 
yields the acino acid distribution: 




4/5; A 

2/64 H 

4/64 P 

1/64 W 



2/6:, c 
3/64 r 
2/64 Q 
2/64 Y 



2/64 D 2/64 E 2/64 r 4/64 C 

2/64 K 6/64 L 1/64 H 2/64 U 

6/64 R 6/64 S 4/64 T 4/64 V 
:/64 stop 



This distribution has the d isativantaqe of gi-inq two 
basic residues for every acidic residue. In addition, 
six ti-es as nuch R, S, and L as W or Mcccur. If rive 
codons • are synthesized with this 0 i::nribution, 
25 sequences encoding five Ks arc 7-'76-tiDes -ore abund:int 
,than sequences encoding five Ws. To h.we w-W-w-w-w 
present at detectable levels, -ve r.ust have n-T:-2-R-R 
present in 7776-fold excess. 

30 Consider the distribution of ar.ino acids encoded 

ty one codon in a population of vqC::A. Lot Abun{x) bo 
the abundance of D::a sequences coding for a.-ino acid x; 
AbunCx) is uniquely defir.ed by tho distributicn of nts 
at each base of the codon. For any d istr itut ;on, there 

35 will be. a r.ost-favored amino acid (r.faa) with abundance 
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Abun(mCaa) and a least- favored afjino acid (IfaaJ with 
abundance Abun(lfaa). We seek r.he nt distribution that 
allows all Twenty amino acids and that yields the 
largest ratio Abun ( 1 f aa ) /Abun (aifaa ) subject to two 
constraints. First, the abundances of acidic and basic 
amino acids should be equal lest ve bias the PBDs 
toward a particular charge. Second, the nur.ber of stop 
codons should be kept as low as possible. Thus only nt 
distributions" that -^'icld Abun(E)-Abun(0) 
Abun ( R) fAbun ( K) are considered, and the function 
maximized is: 

( ( l-Abun(stop) ) (Abund faa)/Abun(mfaa) ) ) . 

We have simplified the search for an cptimal nt 
distribution by limiting the third base to T cr G; C or 
C at the third base would be ctpi ivalcnt . All amino 
acids are possible and the nu-.^ar c; accessible stop 
codons is reduced because VGA and TAA codons are 
eliminated. The ar.ino acids F., V, C, II, I, and D 

require T at the third base while W, M, Q, K, and. K 
require C. Thus we use an equir.olar zvixture of T and G 
at the third base. 

A co.T.puter program, written as part of the present 
invention and named "Find Optir.iin vgCodon** (See Table 
9), varies the composition at bases 1 and 2, in steps 
of 0.05, and reports the co.-npos it ion that gives the 
largest value of the quantity { f Abun ( 1 f aa ) /Abun (of .la ) 
( l-Abun(stop) ) ) J . A vg codon is symbolically defined 
by the nt distribution at each base: 



base 
base 



; I - 

i2 = 



tl 
t2 



cl 
C2 



al 
a2 



g2 
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c3 a3 



g3 



tl + cl + al + gl = 1.0 
t2 + c2 + a2 + g2 = 1.0 
t3 = g3 = 0. 5, c3 = a3 



0. 



The variation of the quantities tl, cl, al, ^i. t2, c2 , 
a2, and g2 is subject to the constraint that 
10 Abun(E) +Abun(D) equals Abun ( K) +Abun ( R ) ; 

Abun(E) *AbunCD) = gl*a2 

Abun(K)-»-Abun(R) = al*a2/2 + cl»g2 + al*q2/2 
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gl*a2 = al*a2/2 + cL*g2 + al*g2/2 
Solving for q2, we obtain 

g2 « (gl*a2 - .0.5*al*a2)/(cl ^ 0.5*al) 
In addition, 

tl = 1 - al - cl - gl 
t2' 1 - a2 - c2 - g2 



25 



We vary al, c:, gl. a2, and c2 and then calculate tl, 
g2,.and t2 . Initially, variation Ls in steps of 'A. 
Once an approximately optinun distribution of nts is 
detemined, the region is further explored uith steps 
30 of It. The logic of this progran is shown in Table 9. 
The optinuQ distribution is: 





1< 5 



Cptirur-. vnCodon 



base n = 0.26 0.13 0.26 0.30 
base 32 - 0.22 0.16 0.40 0.22 
base 43 0.5 0,0 0.0 0.5 



And yields DN'A r.olecules enccd inq each type anino acid 
wich the abundances shown in Table 10. 

The computer that controls a DU^ synthesizer, such 
as the Miliigen 7500, can be progran.'ned to synthesize 
any base of an oligo-nt with any distribution of nts by 
raking some nt substrates (e.g. nt phosphoranidites) 
from each of two or more reservoirs. Alternatively, nt 
substrates can be 3i):ed in ar.y ratios and placed in one 
of the extra reservoir for so called "dirty bottle" 
synthesis. Either of these nethods ar.cunts to 
specifying the nt distribution. The actual nt 
distribution obtained will differ fro=i the specified nt 
distribution due to several causes. including: 3) 
differential inherent reactivity of nt substrates, and 
b) differential deterioration of reagents. It is 
possible to cor.pensate partially for these effects, but 
^sone residual error will occur. We denote the average 
'discrepancy betv/een speci.-ied and obser/cd nt fraction 
as Sgi-r' 

Serr = square root ( averager (fobs 

were f^bs the A.r.ount of one type of nt found at a 
base and f^pec ar.ounc of that type of nt that 

was specified at the sane base. The average is over 
all specifici types of nts and over a nur.bcr (e_^. 10 
or 20) different variegated bases. By hypothesis, the 
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actual nc distribution at a varicgiced base will be 
within 5% of the specified distribution. Actual DNA 
synthesizers and r/M\ synthetic chenistry nay have 
different error levels. It is the user's 

responsibility to deternine S^^-^ for the DNA 
synthesizer and cheaistry e.-ployed by the user. 



To determine the possible efrects of errors in nt 
conposition on the aaino-acid distribution,^ we codified 
10 the prot3ram "Find Optinu.-a vgCcdon" in four ways: 



1) the fraction of each nt in the first two bases 
is allowed to vary frcn its optinun value tines (1 



=crr 



) to the optinun value tines (1 + Sg^-j-) 
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seven equal steps i^arr 



is the hypothetical 
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fractional error Iftvel entered by the user); the 
sun of nt fractions at one base always equals 1.0, 

2) g2 is varied in the sane nanner as a2, ve 
dropped the restriction . that Atun ( U) -Abun { E) = 
Abun{K) +AbuntR) , 

3) t3 and g3 are varied from 0.5.ti-es (1 - S^j-^) 
to 0.5 tixcs (1 + Serr) -^^^^ ^^""^^ steps, 

4) the snal le st ratio AJbun (1 faa) /Abun(rifaa) is 
souoht. 

In actual experiments, ve will direct the synthesizer 
to produce the opti-ur^ c::a distribution "Optirun 
vqCodon" qivon above. Ir.zcr.plcte control over DNA 
chenistry may, hcvevor, cause us to actually obtain the 
following distribution that is the worst that can be 
obtained if all nt fractions are within S\ of the 
amounts* cpecified in "Cptinuin vgCodon". A 




[J 




corresponding tab.lc can be cJilctilaced for any given 
Serr "sing the program "Find worst vgCodcn within Serr 
of given distribution." given in Table II. 



ODtifnum voCodcn. 



worst 5^ errors 



base al = 
base k2 = 
base S3 = 



0.251 
0.209 
0.475 



0. 139 
0. 150 
0.0 



0.273 
0.^00 
0.0 



0 . 287 
0.231 
0.525 



This distribution yields DNA encoding different 
amino acids at the abundances shown in Table 12. 

If five codons arc synthesized with reagents riixed 
so as .to produce the nt-d istr ifcuticn "Opti-um vgCodon", 
and if we actually obtai.".ed the nt-distr ifcucic.-i 
"Cptimun vgCodon, worst 5^ errors", then D.NA sequences 
encoding the mfaa at all of the five coders are atc-Jt 
277 tin:es as likely as DN'A sequences encoding the Ir'ii 
at all of the five codons; about 24 j n:' the -r.'A 
sequences will have a stop ccdon in one or r.ore of the 
five codons. 

When five cbdons are synthesized using eouir.ciar 
nixtures at bases 1 and 2, ( Abun (p f aa ) /Afcun r 1 f ) ) ^ = 
7776: If we prograri the optir.um nt discributicn an:l 
come within 5\ , then ( Abun (r.f aa) /Abun ( 1 f aa) ) ^ = 2-7, 
The total number of different FBDs i.i unch.Tnged, !:ut 
the least-favored .sequence is about 23 tires r.cre 
abundant. Detecting the leas t- favored anino-acid 
sequence when varying four residues with equinolar r.ts 
at each varied base requires as sensitive a separation 
system as dees detecting the least- favored anino-acid 



sequence when varying five residues vith the optinizcd 
nt di stribucion . 



By hypothesis, the distribution "Optimal vgCodon" 
is used in the second version of the second variegation 
of hypothetical exomplG 2. The abundance of the DNA 
encoding e^ich type of amino acid is, however, tat:er. 
from the Table 12. The ;jbundance of DN*A encoding the 
parental amino acid sequence is: 

Amount (parental seq.) 

F24 G30 D34 E42 T47 

= Abun(r) * Abun(C) * Abun(D) * Abun(E) ♦ Abun(T) 
= .0249 X .0663 X .0545 x .0602 x .0437 
15 « 2.4 X 10""' 
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Therefore, DMA encoding the PP3U sequence as well as 
very many related sequences will be present in 
sufficient quantity to be detected and ve arc assured 
2C that the process will be progressive. 

We use the follo*-fing procedure to doterraine 
whether a given level of variegation is practical: 

1) froa: a) the intended nt-d ist r ibut ion at each 
base of a variegated codon, and b) Sg^j- (the error 
level in r:ixcd DtiA synthesis), calculate the 
abundances of DNA sequences" coding for e^ich amino 
acid and stop, 

2) calcinate the abundance of 0::a encoding the 
PPBD sequence by multiplying the abundances of the 
parental amino acid at each variegated residue. 
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The abundances used in the procedure above are 
calculaced from the worst distribution that is within 
Sgj-j. of the specified distribution. A variegation that 
ensures that the PPDO sequence can be recovered is 
practical. PPCD can be recovered if the abundance of 
• PPBD-encoding OriA is larger than both l/^ntv 
1/^sensi- Preferably, the abundance of PPBD-encoding 
DUX is 3 to 10 tir.es higher than both l/M^tv 



1/C 



sensL 



to provide a nargin of redundancy. 



^ntv 



the nurJ:eL- of transf omants that can be oade frora Yq^^qo 
Df/A. With current technology Mntv is approximately 5 x 
10^ » but the exact value depends on the details of the 
procedures adapted by the user. Improvements in 
technology that allow nore efficient: a) synthesis of 
Dllk, b) ligation of d::a, or c) transformation of cells 
will raise the value of -ntv- ^sensi 
sensitivity of the affinity separation; improvements in 
affinity separation will raise Csg^sj^. If the sr.allor 
of Mn^.,. and Csensi is increased, higher levels of 
variegation may be used. for example, if Cgg^g^ is 1 
in 10^ and M^^tv is 10^, then i.T.provenents in Cs^^iii 
less valuable than i-provc.-ents in M^^v. 

A level of variegation that allows recovery of the 
-PPBO has two properties: 



1) ve can 
ava ilable, 



not regress bcr.ause the PPBD is 



2) an enor.T.ous number of multiple changes related 
to the PPEO are available for selection And we are 
able to detect and benefit from these changes. 

It is very unlikely that all of the variants will 
be worse than the PPBD; we req*jire the presence o£ PPBD 
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at detectable levels to insure that all the scquonccs 
present are indeed related to PPBO. 

The user rtust adjust the list of residues to be 
5 varied and levels of varic7a".ion at each residue until 
the calculated variegation is within the bounds sot by 
Mntv ^sensi- 



Preferably, we also consider the interactions 
between the sices of variegation and the surrounding 
DNA. If. the -.ethoU of nutagenesis to be used is 
replacenent of a cassette, we consider whether the 
variegation will generate gratuitous restriction sites 
and whether they seriously interfere with the intended 
introduction of diversity. We reduce or elininace 
gratuitous restriction sites by appropriate choice of 
variegation pattern and silent alteration of codcns 
neighboring- the sites of variegation. See the Detailed 
Example. 



Sec. 



In^ortion of synthetic vcCr/A 



Plasnids : 

In the case of cassette mutagenesis, the 
restriction sites that were introduced when Che gene 
for the inserted dcnain was synthosi;;ed are used to 
introduce the synthetic vgDIJA into a plasr.id or ctlwr 
OCV . Restriction digestions and ligations are 

perfomed by standard -etr.ods . ( AUSU6 7 ) . 

In the case of singlc-stranded-ol iqonucleotide- 
directed r.utagencs is , synthetic vgCr^A is used to create 
diversity in the vector (DOTS35). 




m 



1^^ 




/ 




: 1 

The presume inventicM is not ij::.itcd to any one 
Dethod of transrortninq ccLLs with DNA. The following 
procedure is a modi fic:it ion of that of Mar.iatis (p250, 
5 KANI82). This procedure is only one example of how the 
necessary t rans f orna t ions . nay be pe rf onr.ed . . The 
procedure producjs approx'inately (V^/25) x 10^ or r-ore 
transfornants. The user picKs a v^loc for V^, che 
initial volur.e of the cell culture, to prcvide Che 
10 desired number of t rans f o r:nants . All water is triple 
distilled and is treated with activated charcoal for 2a 
hours . 

1) culture coll in nl of LB broth at 37°C until 
15 cell density reaches 5 x 10^ to ? x lo'' cells/.-r^U 

2) chill on ice tor 65 ninutes, centrifuge the cell 
suspension at -laoOq for 5 r.inutes at 4°C, 

20 3) discard supernatar.t ; rcsuspond the cells in nl 
of an ice-cold, sterile solution oC 60 r^^. CaCl2. 

4) chill on ice for 15 nini:tcs, and then centrifuge at 
4000g for 5 minutes at •;°C, 

25 

-5) discard supernatant; ro-juspend colls in 2 x V^/25 
ml of ice-cold, sterile 60 CaCl^; store cells at 
4°C for fron 10 minutes to 24 hours; transf orrr.at ion 
efficiency increases by about 4-fold in the first 24 
30 hours and then returns to the original value. 

6) add C:iA ia ligation or TE buffer to V^./250 r.l of 
cells; nix and store cn ice for 30 r.inutes. 
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7) hcftt shock cells ^2°C for an appropriate amount 
of time. 



S) add Vc/25 ul LD broth and incubate at 37^0 for I 
5 hour, 

9) plate cells on I-U a^ar containing antibiotic. . 
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10) harvest CPs in appropriate nanner. 

It is not necessary to isolate transforr.ed cells 
between t rans f ornia t ion and affinity separation. We 
prefer to have transforned cells at high concentration 
so that thny can be plated densely on relatively feu 
plates. For this purpose, steps (9} and (10) nay be 
replaced with a procedure In which the cells in step 
(8) are further diluted with LB broth and the 
selecting antibiotic is added. In the case of 
acpicillin, lysis of sensitive colls occurs, and 
resista.Tt cells are enriched by cent r i fugat ion at 2 to 
3 h after addition of antibiotic. 

One routinely obtains between 10' and 5 x 10^ 
transfomants/ug of CCC D.':a. Ligation efficiency 
ranges form 0.1^ for blunt-blunt insertions, to as 
nuch as 15^ for st icKy-st icKy insertions. For large 
transformations, it r.ay be desirable to purify DNA 
between ligation -ind transf ornat icn because unliqatcd 
DNA is thought to compote with CCC DMA for entry into 
the competent colls. Only a small traction of cells 
are competent, typically The heat shock has 

been optimized for transformation reactions carried 
out in a volurre of 200 ul in a plastic tppendcrf tube; 
optimizing this step for larger volumes is possible. 
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This procedure rcqui.rcs up tc 2uq OKA per 10^ 
transfonnants. 



Sec. 14. :i: Crovr h nf CP(vnPBD) copulation : 

The transformad cells arc grown first under non- 
selective conditions that allo'J expression of pLasnid 
qenes and then selected to kill 'untr^ns Corned cells. 
TransCorbed cells are t.^en ind-jced to express zr.c or.p- 
£>>d gene at the appropriate level of induction, as 
determined in Sec. 10. 1. The CPs carrying the rPBO 
are harvested by a n-.ethod appropriate to the package. 

A high level of diversity can be generated by in 
vitro variegated synthesis of C:.*A and this diversity 
can .be naintained passively through several 
generations in an org.misn without positive selective 
pressure. . Loss or reduction in frequency of 
deleterious nutations is advantageous for the purposes 
of the present invontlcr. . As ve do not know how one 
.-night press col i or any other kind of -roll to 

actively -aintain diversity, vo specify that the vgiJ.MA 
r.ust be used to prepare plasnids, that the plasmids 
are used to trcnsfom cells, and that the scicction 
-ziust be performed h'';:ore rcre than a cev generations 
elapse. Moreover, subdividing the variegafjd 
population before -jrpli f icit icn in an orninis.n by 
removing a snail s'i.-plc ( lc=s .than lOt) for further 
work would result in loss of diversity; therefore, one 
should use all or rest of tho liynthetic ONA and most 
or ail of the trans forr.od cells. 

S e c^ 1 5.: Isola t v?n o f O.JILR ^- O^s with bindi nn-to- 

taroet nhcnotvoes : 
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The har.-osced pac^;aqc:5 are r.ou cnricr.cd for the 
bindinq-to-carqec phenotypo cy use of affinity 
separation invoLvinq chc target r-aterial ir-obilized 
on an aCfinity natrLx 
the target naterial 
ore bacter icphaqo c 



Packages that fail to bind to 
are vashcd a-ay. If the packages 
• cndospores, it r^ay be desirable 
to include a bacteriocidal ucjcnt, such as azide, in 
c bactcri.-il growth. The buffers 
t: include: a) any ions or 



the buffer to prevent 
used in cr* ro -a ^cg r^if-hy "US 
other solutes needed to stabilize the target, and b) 
any icns or other sol;:tes needed to stabilize the PDDs 
derived froc the IPCD. 



,qec. 15.1: 



;.r.^HinrT r^P t.iro Q - " at eria l ^o <i colur^n: 



Affinity colur.n chronatog raphy is the preferred 
cethod of affir.ity soparaticn. but other affinity 
separation r.ethods r.ay be used. A variety of 
ccr^ercialiy available support riaterials f.r affinity 
chror.atogra?hv are used. These include derivatized 
beads to ---hich the target material is covalentiy 
linked, or non-der ivatized riaterial to which the 
target material adheres irreversibly. 

Suppliers of support r.aterial for affinity 
'chromatography include: Applied Protein Tecnnolcgics 
Canbridge, Bio-Pad Laboratories, RockviUe Center, 

MY; Pierce Cher.ical Cc-pany, P:;=lcford, IL, Target 
r.aterials are attached to the matrix in accord with 
the directions of the manufacturer of each natrix 
preparation --ith consideration of good presentation of 
the target. 



Sec. 15.2: 



! d ' J c im ?'?!ccticn d:je to no n-snecific 



35 bind i nq : 
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We reduce non-specific binding of CP(PBO)s: to the 
raatrix that bears the target in tuo ways: 

1) we treat the colunn with blocking agents such 
as genetically defective CPs or a solution of 
protein before the population of C?(vgPBD)s is 
chronatographeci , and 

2) we pass the population of C?{vgPBO)s over a 
matrix containing no target or a different target 
from the sar.e class as the actual target prior to 
affinity chromatography. 

Step (1) above saturates any non-specific binding that 
the affinity matrix might show toward wild-type CPs or 
proteins in general; step (2) rer.oves co.-nponents of 
our population that exhibit non-specific binding to 
the matrix or to molecules of the same class as the 
target. If the target were horse heart myoglobin, for 
example, a column supporting bovine serua albu-in 
could be used to trap CPs exhibiting PBDs with strong 
non-specific binding to proteins. If cholesterol were 
the target, then a hydrophobic co:npound, such as p- 
. tert iarybutylbenryl alcohol, could be used to rer.ove 
CPs displaying PBCs having strong non-specific binding 
to hydrophobic compounds. It Is anticipated that PBDs 
that fail to fold or that are prematurely terminated 
will be ncn-s?ecif ically sticky. These seq\:ences 
could outnumber the FDDs having desirable binding 
properties. Thus, the capacity of the initial column 
that removes Indiscr ir.inately adhesive PODs should be 
greater f o .g. 5 fold greater) than the colu.-n that 
supports the target molecule. 
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riation in. the support material (polystyrene, 
llutose, etc ._ ) in analysis of clones 



glass, agarose, ce 
carrying GLlDs is used to eliminate enrichment for 
packages thac bind to the support material rather than 
the' target. 

Sec. 15.1: K luti ng colu-Tin: 

To -separate the CP(PBO)s that carry ?BDs that 
show actual binding' to the target frora CP(PBD)s that 
carry PSDs that do not actually shov binding to the 
target, the population of CPs is applied to an 
affinity matrix under conditions compatible with the 
intended use of the binding protein and the population 
is fractionated by passage of a gradient of sons 
solute over the column. The process enriches for PBDs 
having affinity for the target and for '-^hich the 
affinity Cor the target is least affected by the 
eluants used. the enriched fractions are those 
containing viable CPs that elutc frori the colunn at 
greater concentration of the eluant. 

Any ions or cofactors needed for stability cf 
PBDs (derived fron IPDO) or target must be included in 
'initial and elution buffers at appropriate levels. We 
first remove G?(PBD)s that do not bind the target by 
washing the natrix vith the volume of the initial 
buffer required to bring the optical density (at 26C 
nn or 280 nn) back to bace line plus one vo id . volu.-ne 
(Vv), but not more than 5 v^. The column is then 
eluted with a gradient of Increasing: a) salt, b) [H+] 
(decreasing pH) , c) neutral solutes, d) temperature 
(increasing or decreasing), or e) some combination of 
these factors. The solutes in each of the first three 
gradients have been found generally to weaken non- 
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cov^lont inCer?.cc ions betveen proteins iind bound 

molecules. Salt ic the niost preferred solute for 
gradient fomation in nost cases. Other solutes that 

generally weaken non-covalent interaction betveen 

proteir.s and bound r.olecules nay also be used. ''Salt" 

includes solutions containing any or all of the 
folloving ionic species: 



10 


Na-r 


K- 


Ca* + 






NH^ + 


Li* 


Sr+ + 


Ba + + 




Rb* 


CS + 


Cl- 


Br- 


15 












SO4 — 


KSO4- 




HPO4-- 








HCO3- 
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A.*zino Acids 
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ether ionic or neutral solutes nay be used. All 
25 solutes are subject to the necessity that they not 
kill the genetic pjc>:ages. Because bacteria continue 
tc r.ecaboli:c d'jring affinity separation, the choice 
of buffer cor.poner.ts is r.ore restricted for bacteria 
than for bacteriophage or spores. Neutral solutes, 
30 such as ethanol, acetone, ether, or ;irea, are 
'frequently used in protein purification and are known 
to weaken non-ccvalent interactions between proteins 
and other ccleculcs. Kany of these species are, 
however, very- harr.ful tc bacteria and bacteriophage. 
35 Bacterial spores, cn the other hand, are iapervicus to 
r.ost neutral sclutes. Several passes nay be nade 
thrcuqh the steps in Sec. 15. Different solutes r.ay 
be used in different analyses, salt in one, pH in the 
next, ere . 

15.4: RccrvoTv of p.-^ck^-^cTos : 
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Recovery of packages that display binding to an 
affinity column r.ay be achieved in several ways, 
including: 

5 

1) collect fr.TcCions cluted from the colunn with 
a gradient as described above; fractions eluting 
later in the gradient c::ntain CPs nore* enriched 
Cor 'qcncs encoding PtlDs vith high affinity for 

10 the column, 

2) eiute the column with the target material in 
soluble form, 

15 3) flood the r.atrix with a nutritive nedium and 

grov the desired packages in sitij . 

4) reraove parts of the r.atrix and use then: to 
inoculate gro'-th nediua, 

20 

5/ chemically or chzyna tical ly degrade the 
linkage holding the target to the matrix so that 
CPs still bound to target are eluted, or 

25 ' 6) degrade fhe packages and recover CN'A with 

phenol or other suitable r.olvcnt; the recovered 
• DNA is used to transfom cells that regenerate 
CPs. 

30 

It is possible to utilize conbinations of these 
nethods. It should be renenbcred that what we want to 
recover from the affinity r\atrix is not the CPs £er 
se. but the inforr.ation in then. Recovery of viable 
.35 CPs is very strongly preferred, but recovery' of 
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genetic material is essential. If colls, spores, or 
virions bind irreversibly to the • matrix buc are not 
Jcilled, ue can recover the information throuch XB situ 
cell division, germination, or infection respectively. 
5 Proteolytic degradation of the packages and recovery 
of DMA is not preferred. 

Although degradation of the bound .CPs and 
recovery of genetic material is a possible mode of 

10 operation, inadvertent inactivation of the CPs is very 
deleterious. It is preferred that maxiroun limits for 
solutes that do not inactivate the CPs or denature the 
target or the column are detemined. If the affinity 
matrices are expendable, one may use conditions that 

15 denature the column to elute CPs; before" the target is 
denatured, a portion of the affinity matrix should be 
removed for possible use as an inoculum. As the CPs 
are held together by protein-protein interactions and 
other non-covalent molecular interactions, there will 

2C be cases in which the molecular package will bind so 
tightly to the target molecules on the affinity matrix 
that the CPs can not be washed off in viable form. 
This will .only occur when very tight binding has been 
obtained. In these cases, methods (3) t.^rcugh (5) 

25 above can be used to obtain the bound pac>c:iges or the 
genetic messages from the affinity matrix. 

It is possible, by manipulation of the elution 
conditions, to isolate SDOs that bind to the target at 

30 one pH (pHt,) but not at another pH (fHq) • '^^^ 
population is applied at pH^^ and the colu-n is washed 
thoroughly at pH^. The column is then cluted with 
buffer at pHq and CPs that come off at the new pH are 
collected and cultured. Similar procedures may be 

35 used for other solution parameters. such as 
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temperature. For example, C? (vqPQD) s could be applied 
to a column supporting " insulin. After elucing with 
salt to remove CPs with little or no binding to 
insulin, we elute with salt and glucose to liberate 
5 CPs that display PBDs that bind insulin or glucose in 
a competitive nanner. 

Soc. 15.5: Anpllfvinn th<? Fnric^.ed Pacl^aqes 

10 Viable CPs having the selected binding trait are 

amplified by culture in a suitable mediuni, or, in the 
case of phage, infection into a host so cultivated. 
If the CPs have been inactivated by the 



30 



For exanple, if N wore 1.0 x 10' and C^ff = 
6.31 y. IO2, then loglO(l,0 x 106) / log 10 (6 . J 1 x 102) = 
6.0000/2.8000 = 2.14. Therefore we would attempt to 
isolace SSDs after the third separation cycle. After 
35 only two separation cycles, the probability of finding 



chromatography, the OCV carrying the osp - pbd gene must p.. 



15 be recovered from the CP, and introduced into a new, 
viable host. 

Sec. 15.6: De t err: in im whe^.her furtri'^r enrichment is 
needed : 

20 

The probability of isolating a G? with improved 
binding increases by Cgf^ with each separation cycle. 

Let fJ be the nuriber of distinct amino-acid sequences ^''^':"> 
produced by the variegation. We want to per for::. K ^'Vj' 
25 -separation cycles before ottempting to isolate an SBD, >' 
where K is such that the probability of isolating a r-*'"" ■ 

single SBD is 0.10 or higher. ■ 



I: 



K = the smallest integer>= log^otO-^^ f^) /log^QCCef f ) 

t: ■ 
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an SBD is (6.31 x.lO-)^/(i.O x lo') = .OA and 
attempting to icoLate SBOs might be profitable. 



Clonal isolates from the last Craction eluted in 
Sec. 15.3 containing any viable CPs, as well as clonal [: 
isolates obtained by culturing an inoculun taken fron 



the a C£ ihity * matrix, are cultured in a gro^ith step 

that is sinilar to that described in Sec. M.3. If K K-; 

V 

separation cycles have been cc-plcted, sanples from a j. 

10 number, e.g. 32, of those clonal isolates are tested 
for elution properties on the {target} column. If 
none of the isolated, genetically pure CPs show 
improved binding to target, or if K cycles have not f/j 
yet been completed, then we pool and culture, in. a 

15 manner similar to the manner ::ec forth in SEc, 14.3, 
the CPs from the last few fractions eluted {see Sec. 
15,4) that contained viable CPs and fron the CPs 
obtained by culturing an incculum taken from the 
column ;zatrix. Wc then repeat the enrichment 

20 procedure described in Sec. 15'. This cyclic 

enrichment may continue N^hrcm P-^sscs or ur.til an 530 ^_ 
is isolated. 
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If one or more of the isolated CPs has improved 

25 retention on the (target) column, wo determine whether 

- the retention of the candidate :3BDs is due to affinity f 

t- 

for. the target material as follows. A second column i 
is prepared using a different support matrix with the 
target material bound at ttic optimal density. The 

30 elution volumes, under the same elution conditions as 
vjsed previously (see Sec, 15.3). of candidate CP(SDO)s 
are compared to each other and to CP(PP3D of this 
round). If one or more candidate CP(S30)s has a r-*'..^ 
larger elution volume than CP(PPBO of this round), t/r 

35 then we pick the CP(SDD) having the highest elution r > 
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volurc and prccced to characterize tne population (see 
Sec. 15.7). If none of the candidate C?(550)s has 
higher elution volune than CPCPP30 of this round), 
then vje pool and culture, in a z.anner si.-ailar to the 
nanner used previous!/ (Sec. 15.3), the G?s frcn the 
last fcv fractions that contained viable CPs and the 
GPS obtained by culturing an inoculu.-n tahen frora the 
column matrix. v:e then repeat the enrich.-nen- 

procedure of Sec. It. 

If all of the S3Ds show binding that is superior 
to PP3D of this round, ve pool and culture the CPs 
from the last fraction that contains viable CPs and 
froni the inoculu.-n taken fro= the colucn. This 
population is rc-ch rc-atoqraphcd at least one pass to 
fractionate further the CPs based on K^-j. 

If an K.'.'A phaf^G vere used as CP, the FJJA *-culd 
either be cu L tured ' vi rh the assistance of a helper 
phaf^c or be reverse tr.=*.nscr :bed ar.d the ,")NA ar.^lificd. 
The amplified CN'A csuld th.-^.r. be z^zqui^r.ce^A or subclcnci 
into suitable pi a sn iris. 

Sec. 15.7: Ch,^ r.'ic t-^r : .t • rhe rcr- u 1^ t io.i : 

Wc characterize r.c-bors of the pcpulatic.n shoving 
desirnd bindinq properties oy gore::ic ^nd biocher. ica 1 
nethods. We obtain clonal isolates and test these 
strains by genetic and affinity iicthods to deterniine 
Genotype and phcnotype --ith respect to binding to 
target. Fcr several genetically pure isolates tha;: 
shov bindinrj, ve dc.-onst ra tc that the binding is 
caused by the artificial chi.T.oric gene by excising the 
or. o-chd ••jcno. and crossing it into the parental CP. 
also ligato the deleted bac>:bonc of each CP frcn which 
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the osp-s^id is rer.oved and dczionstrate thac each 
backijone alone cannot confer binding to the target on 
the CP. We sequence the osp-sbd gene fron several 
clonal isolates. Priners for sequencing are chosen 
5 from the DMA Clanking the o ?r-or;bd gene or from parts 
of the osn-pDcd gene that are not variegated. 

Sec. 15.8: Testing of binding affinity: 

10- For one or more clonal isolates, we subclone the 

Gbd gene fragrient, without the osp fragment, into an 
expression vector such that each SBD can be produced 
as a free protein. Because nunerous unique 

restriction sites were built into the inserted domain, 

15 it is easy' to subclone the gene at any tir.e. Each SBD 
protein is purified by normal means, including 
affinity ch ror.atography . Physical -easurerients of the 
strength of binding are then -ade on each free SBD 
protein by one of the following nethods: 1) alteration 

20 of the StoV:fes radius as a function of binding of the 
target material, r.easured by characteristics of 
eiution fron a nolecular sizing column such as 
agarose, 2) retention of radiolabeled binding protein 
on a spun affinity coluz.n to which lias been affixed 

25 ,the target nacerial, or 3) retention of radiolabeled 
target material on a spun affinity coiur.n to which has 
been affixed the binding protein. The r.easure-.ents of 
binding for each free SBD * are compared to the 
corresponding r,easurer:cn ts of binding for thn P?SO. 

30 

In each assay, we ncasure the extent of binding 
as a function of concentration of each proiein, and 
other relevant physical and chemical parameters such 
as salt concentration, temperature, pH, and prosthetic 
35 group concentrations (if any). 
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in addition, tlie SBD with highest affinity for 
the target fron each round is cor^pared to the best SBD 
of the previous round (IPSO for the first round) and 
to the IPBD (second and later rounds) with respect to 
affinity for the target natcriai. Successive rounds 
of mutagenesis and se Lec t ion -chrcugh-b ind inq yield 
increasing affinity until desired levels are achieved. 

If ue find that the binding is not yet 

sufficient, ue nust decide which residues to var/ next 

(see Sec. 16.0). If the binding is sufficient, then 

ve now have a expression vector bearing a gene 
encoding the desired novel binding protein. 



Sec, 



T^.Q: Other Affinity Sr^nnr Atlon 'ic^nsi 



FAC3 -may be used to sopa.-ate CPs that bind 
fluorescent labeled target with the cptinized 
parameters determined in Part It. We d iscr ir.inate 
against artifactual binding to the fluorescenr able by 
using tvo or more differerrt: dyes. chosen to be 
structurally different. CPs isolated ' us inq target 
labeled with a first dye are cultured. These CPs are 
then tested with target labeled with a second dye. 

. Electrophorctic affinity separation uses 
unaltered target so that only other ions in the buffer 
can give rise to artifactual binding. Artifactual 
biding to the gel material gives rise to retardation 
independent of field direction and so is easily 
elininated. A voricgatcd population of CPs will have 
a variety of charges. The following 2D 

electrophoretic procedure accor.noda tes this variation 
in the population. 
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First the vdricgatod population of CPs is 
elect rophorcscd in a gel that contains no target 
niaterial. The electrophoresis continues until the CP 
5 s are dictributcd along the length of the lane. The 
gels described ty Sewer for phage are very lo*^ in 
agarose and lack mechanical stability. The target- 
free lane in which the initial electrophoresis is 
conducted is separate frcD a square of gel that 

10 contains target material by a removable baffle. After 
the first pass, the baffle is removed and a second 
electrophoresis is conducted at right angles to the 
first. CPs that do not bind target migrate with 
unaltered mobility while CP s that do bind target will 

15 separate from the majority that do not bind target. A 
diagonal lino of non-binding CPs will form. This line 
is excised and discarded. Other parts of the gel are 
dissolved and the CPs cultured. 

2 0 Soc. 16,0 : Tne N>xt V-t r i oa.i t ion Cycle : 

We now consider which residues of the P3D should 
be varied in th^ next variegation cycle. The general 
rule is to preserve as much accumulated in fon.;a t ion as 

25 ^possible. If the level of variegation in the 'jrevious 
variegation cycle was correctly chosen, then the amino 
acids selected to be in the residues just varied are 
the ones best determined. The environinent of other 
residues has changed, so that it is appropriate to 

30 vary them again. Uccause there are always nore 
residues in the principal {Sec. 13.1.1) and secondary 
sets (Sec. 13.1.2) than can to v.iried s ir.u 1 tanecus I y , 
we start by picking residues that either have never 
been varied (highest priority) or that have not been 

35 varied for one or more cycles. If we find that 
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S^-C. 17.0: OTHFR CONS T DFPATrcrrS : 
Sec. 17.1: Joint 'se I Rct i ors : 

5 

One may modify the affinity separation of the 
method described to select a molecule that binds to 
material A but not to material B, One needs to 
prepare' two selection colur.ns, one with material A and 

10 the other with material B. The population of genetic 
packages is prepared in the manner described, but 
before applying the population to A, one passes the 
population over the D column so as to remove those 
raembers of the population that have high affinity for 

15 B ("reverse affinity chromatography"). In the 

preceding specification, the initial column supported 
some other molecule simply to remove CP(?3D)s that 
di.Tplayed . PBDs having indiscriminate affinity for 
surfaces. 

20 

It may be necessary to amplify the population 
that does not bind to B before passing it over A. 
Amplification would most likely bo needed if A and B 
were . in some ways similar and the PPBD has been 
25 selected for having affinity for A. The optinum order 
of interactions night be dctcmined empirically. 

For example, to obtain an SOU tiiat binds A but 
not B, three columns could be connected in series: a) 

30 a column supporting some compound, neither A nor B, or 
only the matrix material, b) a column supporting 3, 
and c) a column supporting A. A population of 
CP(vgPBD)s is applied to the series of columns and the 
columns are washed with the buffer of constant ionic 

35 strength that is used in the application. The colurans 
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are unco::?icd , and the 'tliird colunn is eluted with a 
gradient to isolate CP{PBD)2 that bind A but not 8. 

. Gn^ can ,_lso generate molecules that bind to both 
A and B. In this case we can use a 3D model and 
nutate one face of the molecule in question to get 
binding to A. One can then mutate a different face 
to prccuce binding to E. When an SOD binds at least 
sor.cwn'it. to both A and B, one can mutate the chain by 
Diffuse Mutagenesis to refine the binding and use a 
sequential joint selection for binding to both A" and 
B. 

The naterials A and 8 could be proteins that 
differ ot only one or a fcv residues. For example, A 
could be a natural protein for which the gene has been 
cloned and B could be a nutant of A that retains the 
overall 3D structure of A. SBDs selected to bind A 
but not 3 nust bind to A near the residues that are 
nutated in D. If the nutations were picVied to be in 
the active site of A (assuming A has an ac-ive site), 
then an SBD that binds A but not D will bind to the 
active site of A and is likely to be an inhibitor of 
A. 

To obtain a protein that will hind to both A and 
B, we can, «-ilternatively , first obtain .in SBD that 
binds A and a different SBD that binds B. We can then 
cor.b Lne the genes encoding these doma.'ns so that a 
two-dcc4in single-polypcptide protein is produced. 
The fusion protoin will have affinity for both A and B 
because one of its donains binds A and the other binds 
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One can <ai*JO *jGnccato binding protein:; wiili 
affinity for both A and 3, such that these matcriaiii 
■will compete tor the sace site on the binding protein. 
We guarantee competition by overlappin*^ the sites for 
5 A and B. Using the procedures of the present 
invention, we first create a r.oleculo that binds to 
target naterial A. Wo then vary a set oC residues 
defined as: a) those residues that were varied to 
obtain binding to A, plus b) those res idues . c lose in 

10 3D space to the residues of set (a)* but th.it are 
internal and so are unlikely to bind" directly to 
either A or D. Residues in set (b) are likely to r.ake 
small changes in the positioning of the residues in 
set (a) such th.^t the affinities for A anr. B will be 

15 changed by snail anounts. Members of these 
populations arc: selected for affinity to both A and 3. 

Sec. 17.:!: Selection Cor non-b i r.d i ho : 

20 The method of the present invention c.in be used 

to select proteins that do rot bind to selectc! 
targets. Consider a protein of pharrr-acolog ic^i 1 
importance, such as streptokinase, that is .■\rtiqon:c 
to an - undesirable extent. We can v.aKo cho 
25 pharmacologically important protein as IrbO and 
antibodies against it nr. target. Residues on the 
surface of the pharmncolcgica I ly i -porta nt protein 
would be variegated and Cf»(FDD)d that do not bind to 
an antibody colu.T.n would be ccllectcd and cuiturcd- 
30 Surface residues nay be identified in sever.ii wayr. , 
including: a) from a 3D structure, b} Cron 
hyd rophob ic i ty considerations. or c) chemical 
labeling. The 3D structure of the pharmacologically 
important protein remains the preferred guide to 
picking residues to vary, evccpt now we pick residues 
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that are widely spaced -so that we leave as little as 
possible of the original surface unaltered. 

Destroying bind i ng ■ frequent ly requires only that 
a single anino acid in the binding interface be 
changed. If polyclonal antibodies are used, we face 
the problem that all or most of the strong epitopes 
nust be altered in a single nolecule. Preferably, one 
would have a set of nonoclcnal antibodies, or a narrow 
range of antibody species. If we had a series of 
monoclonal antibody colunns, we could obtain one or 
more mutations that abolish binding to each monoclonal 
antibody. We could then combine some or all of these 
mutations in one molecule to produce a 
pharmacologically important protein recognized by none 
of the monoclonal antibodies. Such mutants must be 
tested to verify that the pharmacologically 
interesting properties have not be altered to an 
unacceptable degree by the mutations. 

Typically, polyclonal antibodies display a range 
of binding constants for ar.ticen. Even if we have 
only polyclonal antibodies that bind to the 
pharmacologically i.-portant protein, we may proceed as 
follows. We engineer the pharmacologically important 
protein to appear on the surface of a replLcable G?. 
^We introduce mutations into residues that are on the 
surface of the pharnacolog ica 1 ly important protein or 
into residues though. t to be on the surface of the 
pharmacologically important protein so that a 
population of CPs is obtained. Polyclonal antibodies 
are attached to a column and the population of CPs is 
applied to the column at low salt. The column is 
clutcd with .1 salt grodiont. The CPs that elute iat 
the lowest concentration of salt arc those which bear 
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pharmacologically important proteins that have been 
nutated in a way that clininaces binding to the 
antibodies having naxinuc affinity for the 
pharmacologically important protein. The CPs eluting 
at the lowest salt arc isolated and cultured. The 
isolated SDD becorr.es the PPBD to further rounds of 
variegation so that the antigenic determinants are 
successively eliminated. 



10 



Sec. 



17.3: 



Selection cf PBDs for retention of 



s tructu re : 



Let us take an SDD with known affinity for a 
target as PP5D to a variegation of a region of the PDD 

15 that is far fron the residues that were varied to 
create the SBD. We can use the target as an affinity 
moltcule to select the PBDs that retain binding for 
the target, and that presumably retain the underlying 
structure of the IPHD. The variegations in this case 

20 could include ir.se rt io.".s and deletions that are likely 
to disrupt the IPBD structure. we could also use the 
IPBD and AfM(I?2D) in the same way. 

For exa.-nple, if IPBD were 3?^! and A:'M{E?TI) wore 
25 trypsin, we could introduce four or five additional 
residue after residue 26 and select CPs that display 
PDDs having specific affinity for AfM(BrTI). Residue 
26 is chosen because it is in a turn and because it is 
about 2 5 A from K15, a key a-ino acid in binding to 
20 trypsin. 



The underlying i; tructu re is aost likely to be 
retained if insertions or deletions arc nade at loops 
or turns. 
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Sec. 17.4: 
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^rcd t;tn'JinQ pre coins not unjuue: 



For each target, there are a large nu»iber of SBDs 
that may be found by the sethod of the present 
invention. The pcocecs relies on. a coir.binat ion of 
protein structural cons ide.-at ions , probabi 1 it ics < and 
targeted nutacions with .accurriuiat ion of inCovmation. 
To increase the probability that some PDD in the 
populacior. vili bind to the target, we generate as 
large a population as we can conveniently subject to 
s'^.lection-through-b inding in one experinent. Key 
questions in management of the method are "How many 
transfomants can we produce?'*, and "How small a 
component can we find through selection-through- 
binding?". Geneticists routinely find mutations vith 
frequencies of one in 10^° using simple, powerful 
selections; wc e xpe r i. me n t a M y determine the 
sensitivity of cur procedure. The optimum level of 
variegation is determined by the maximum number of 
trans forr.ancs and the selection sensitivity, so that 
*"or any reb=,r»nable sensitivity we nay use a 
progressive process to obtain a series of proteins 
with higher and higher affinity for the chosen target 
material. cnr ichr.ienr s of 1000-fold by a single pass 
of elution from an offir.ity plate have been 
demonstrated (SMIT3S). Throe rounds of such 
enrichment could produce 10^-fold enrichment, and 
additional rounds may be added if necessary. 

Use of different variation schemes can y-.eid 
different binding proteins. For any given target, 
there is a larg*^ plurality of proteins that will bind 
to it. Thus, if cr.e binding piotein turns out to be 
unsuitable for some reason f e . g . too antigenic), the 
procodurr: can be repeated with diffeient variation 
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parameters. For exanipLe, one might c!»oose different 
residues to vary or pick a different nt distribution 
at variegated ccdons so that a new distribution of 
amino acids is tested at the sane residues. Even if 
5 the same principal set of residues is used, one might 
obtain a different SBD if the order in vhich one picks 
subsets to be varied is altered. 

Sec. 17.5: Other ^odcs of ntitagenesls possible: 
10 The modes of creating diversity in the population 

of CPs discussed herein ore not the only niodes 
possible. Any nethod of nutagenesis that preserves at 
least a large fr:iction of the information obtained 
from one selection and then introduces other mutations 
15 in the sair.e dorr.^iin vill vork. The limiting factors 
are the number of independent trans formants that can 
be • produced and the amount of enrichment one can 
achieve through affinity separation. Therefore the 
preferred eabodi-ent uses a method of mutagenesis that 
20 focuses mutations into those residues that are most 
likely to affect the binding properties of the POD and 
are least likely to destroy the underlying structure 
of the IPBD. 

25 Other modes of mutagenesis raight allov other CPs 

to be considered. For example, the bacteriophage 
lambda is not a useful cloning vehicle for cassette 
mutagenesis because of the plethora of restriction 
sites. One can, however, use • s ingle-s tranded-ol igo- 

30 nt-directcd mutagenesis on lambda without the need for 
unique restriction sites. Uo one has used -jingle- 
s trandcd-o 1 igo-n t -d i rected mutagenesis to introduce- 
the high level cf diversity called for in the present 
invention, but if it is possible, such a method would 

35 allow use of phage with large genomes. 



Presented belov is a hypotnetical example of a 
protocol for developinq a nev binding niolecuLe derived 
from BPTI wich affini::y for horse hc>art .-ayoglobin 
(HHMb) using the conmcn col 1 bacteriophage MI3 as 

genetic package. It vill be understood that sor.e 
further optimization, in accordance with the teachings 
herein, may be necessary to obtain the desired results. 
Possible r:odif Ications in the preferred method arc 
discussed innediately foUoving various steps of the 
hypothetical example. 

By hypotr.esis, ve set the follcjing technical 
capabil it ies : 

Yqq 500 • ng/synr.hesis of ssOUA 100 bases 

long , 

10 uq/syn;:hGsis of ssC'.'A 60 bases long, 
1 mg/synthocLS o: ssDHA 20 bases Icng. 

•"^DNA baces 
V^l 1 mg/l 

L^f O.l ^ for blunt-blunt, 

4 \ for St icKy-blunc , 

11 \ for St icl-;y -sticky - 



".ntv 



5 X 10^ 



17S 



Ceff 900-Colcl enrichment 



Wchrom 1° passes 
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10 Ex^rple 1. Part I 

In this example, wc will use M13 as a replicablc 
GP and BPTl as IPBD. The considerations that lead to 
these choices are discussed. In Part I, we are 

15 concerned only with getting BPTI displayed on the outer 
surface of an M13 derivative. Variable DNA may be 
introduced in the o sp- iobd gene, but not vithin the 
region that codes .for the trypsin-binding region of 
BPTI. Once BPTI is displayed on the Ml 3 outer surface 

20 of an M13 derivative, ue proceed to Part 11 to optimise 
the affinity separation prcccdures. 

We consider various CPs and, for this exa-iplc, 
choose a filamentous bacter icpnage of col i . M13. We 
25 prefer phage over vegetative bacterial cells because 
- phage are nuch less netabol ica I ly active. We prefer 
phage over spores because the nolccular nechanisris of 
th'i virion forir.ation and 3D structure of the virion are 
nuch better understood than are the corresponding 
30 ■ processes of spore forr.ation and structures of spores. 

H13 is a very veil studied bacteriophage, uidely 
used for D;JA sequencing and as a genetic vector; it is 
a typical r.onber of the class of filamentous phages. 
35 The relevant facts about Ml 3 and other phages that will 



allow us to choose ancng phages are ciced m Sec. 
1.3.1. 

Corr-parcd to other bacteriophage, filar.entous phage 
5 in general are attractive and M13 in particular is 
especially attractive because: 

1} the 30 structure o: the virion is known, 

j^O 2) nhe processing of the coat protein is well 

•.'r.d«;r stood, 

3) the qencr.e is ^ expandabi e . 
15 :) the genome is snail, 

5) the sequence of the gcno.T.e is icnown. 

6) the virion is physically resiscant to shear, 

20 heat, cold, guanidiniuni Cl, low pH, and high salt, 

7) the phage is a sequencing vector so that 
sequencing is especially easy, and 

25 a) antibiotic-resistance genes have been cloned 

into the gcnone with predictable results (HINESO) . 

Other criteria listed in Sec. 1.0 and 1-3 of the are 
also satisfied: K13 is easily cultured and stored 

30 C.=-RIT35), each infected cell yielding 100 to lOCO M13 
progeny after infection. M13 has no unusual or 
expensive media requi ro.T.cr.ts and is easily harvested 
and concentrated (SALI64, FP.IT85) . M13 is stable 
toward physical agents: tenperature (101 of phage 

35 survive 30 minutes at 85^C) , shear (Waring blonder does 
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;.ool -cable) , radiation (not 
not ki\>.), desiccation (not appl.caoie), 

applicable), a^e (stable Cor years). 

„U is stable toward cheoicals: pH (< 2.2 
S ,SMITS3)), surface active agents: not ^PP^^"^ - 
Caotropcs (guanidiniu. UCl = .-C «, . ions = 
sensitivities,, organic solvents (ether and o .c 
organic solvents are lethal (KARV7S),, proteases (not 
.pplicble, meib not a protease,. M13 is not .novn to 
10 be sensitive to other enzynes. 

Ml, gonoce is 64 2 3 b. p. and the sequence is y.noun 
(SCHA73,. Because the genoee is s.all, cassette 
mutagenesis .s practical on RF K13 (AUSUS7, as is 
15 singlc-strandcd oligo-nt directed mutagenesis (rR..35 . 
„13 -s a plasznid and transfon..at ion system xn itself, 
and ..n ideal sequencing vector. M13 c.n he grown on 
Rec- strains of coli- The K13 genome is expandable 
(M^SS7S. FRIT35,. confers no advantage, but 

20 doesn't lyse cells. The sequence of gene vjll - 
Kno-n, and the a.ino acid sequence can be encoded on a 
synthetic gone. using 1^ P-n^oter and used ... 
conjunction vith the tacl^ repressor. The 1^ 
promoter is induced by IPTC. Gene VIII protein is 
25 secreted by a well studied process and is cleaved 
- between A23 and A24. Residues 18. 21. 22, and 23 o 
gene VIII protein control cloavogc. Mature gene VI. I 
protein ..ahes up the sheath around the circular ssD.A. 
The 3D structure of fl virion is known at r.ediu., 
30 resolution: the a..ino ter.inus of gene VIII protein is 
cn surface of the virion. S= fusions to M13 gene VIiI 
protein have been reported. The 2D structure of .Ml. 
coat orotein is inplicit in the 3D structure. Mature 
MU gene VIII protein has only one domain. There are 
35 four minor proteins: gene III. VI, VII, and IX. £ach 
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Of these ninor proteins is pr-sent in about 5 copies 
per virion And is related to morphogenesis or 
infection. The major coat protein is present in nore 
than 2500 copies per virion. 

Althouqh no fusions ot ru : gene VTH to ether 
genes have been reported, :-:nowLcdge of the virion 30 
structure r.akcs attachncrt of IPOD to the amino 
terminus of mature M13 coat protein (M13 C?) quita 
attractive (See Sec. 1.3.2). Should direct fusion of 
BPTI to Mi3 CP fail to causp BFTI to be displayed on 
the surface of M13, we vill vary part of Che BPTI 
sequence and/or insert shore r^ndon DMA sequences 
between BFTI and M13 C? (Soc. 1.3.4). 

Smith {SKIT85) and de la Cru3 ct aU (CSU323) have 
shown that insertions into gene iH cause novel procein 
dor_ains to appear or. the virion ouuer surface. If BPTI 
can not be made to" appear on Che virion outer surface 
by fusing the boti gene to the r,iUcr: ^ene. we will fuse 
hpti to gene m either ac x.r.a zite used by Sni-h and 
by de la Cruz ct a_U or to one of the temini. We vill 
use a second, synthetic copy of gene HI so that sor.e 
unaltered gene III protein will be present. 

• The gene VIII protein is chosen as OSP because it 
is present in many copies and because its location and 
orientation in trie virion are known. Note that any 
uncertainty about the azir.uth of the coat protein about 
its own alpha helical axis is unimportant:; the amino 
terminus ir exposed for ail azimuths. 
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The 3D model of fl indicates strongly that fusing 
BPTI to the amino terminus- of M13 CF is more liXely to 
35 yield a functional protein than any other fusion site. 
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(See Sec. 1.3.3) . 

The anino-acid sequence of M13 pre-coat (SCHA78) , 7 

called AA seql, is \ 

AA_seq 1 p 

1 2 2 I j 2 3 3 4 ■ 4 5 |l 

5 0 S 0 {/5 0 5 0 5 0 i 
10 MKXSLVLKASVAVATLVFI'.LSF/V.ECDDPAKAArNSLQASATEYIGYAWA 



5 6 6 7 7 

5 0 5 0 3 

HVWIVCATIGIKLFKKFTSKAS 



15 



B 



The single-letter codes for anino acids and the codes ^. 

for ambiguous D.'/A are given in Table 1. The besc site j: 

for . .insert ing a novel protein domain into M13 CP is \-' 

20 after A23 because SP-I cleaves the precoat protein f.' 

after A23, as indicated by the arrow. Proteins that k. 
can be secreted uill appear connected to mature K13 CP 

at its anino teminus. Because the amino terrinus of h' 

mature K13 CP is located on the outer surface cf the r!' 

25 virion, the introduced dor.ain will be displayed cn the i- 

outside of the virion. : 
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SPTI is chosen as I?BD of this example [Zcg Sec. ? 

t 

2.1) because it neets or exceeds all the criteria: it [ 

30 is a small, very stable protein with a well known 30 i 

structure. MarKs et ±Li. (VJ^RKae) have shown that a [• 
fusion of tlio nhoA signal peptide gene frag^-ont and DriA 
coding for the mature fom of BPTI caused native BPTT 

to appear in the periplasm of c ol. i . demonstrating ^• 

35 that there is nothing in the structure of Br-TI to f 

prevent its being secreted. [i 

t 

Marks e^ ai^ ^lV^R>:Z7) also showed that the [■■ 

structure of B?TI is stable even to the removal of one r. 
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of the cystine bridges. They did this by replacing 
both C14 and C33 with either two alanines or two 
threonines. The CII/CjS cystine bridge that Harks et 
a 1 . removed is the one very close to the scissile bond 
in BPTI; surprisingly, both mutant molecules 
functioned as trypsin inhibitors. This indicates that 
BPTI is redundantly stable and so is likely to fold 
into approximately the same structure despite numerous 
surface mutations, Vsing the knowledge of homolo^ues, 
vide Infra . we can infer which residues must not be 
varied if the basic BPTI .^^tructure is to be maintained. 

The 3D structure of BPTI has been determined at 
high resolution by X-ray diffraction (KU3E77, r<AP.Q33, 
WLOD34, WLO037a, WLOD37b), neutron diffraction 
(WLOD84), and by ;/MR (WAG:;87) . In one of the X-ray 
structures, deposited in the Brookhaven Protein Data 
Bank, "GFTT", there was no e lectron • dens i ty for 
indicating that A53 has no uniquoly defined 
conformation. Thus we know that the carboxy group does 
not make .^ny essential interaction in the folded 
structure. The anino terminus of BPTI is very near to 
the carboxy terminus. Coldenberg and Creighton 
reported on circularized BPTI a.-d circularJy permuted 
'BPTI (GOLD83) . Scmc proteins homologous to DPTI have 
more or fewer residues at either terminus. 

BPTI has been called "the hydrogen atom of protein 
folding" and has beer the subject of nur.prous 
experimental and theoretical studios (STAT37, SCHW87, 
COLp33, CHAZ83). 

BPTI has the added advantage that at legist 32 
homologous proteins are known, as shown in Table 13. A 
tally of ionizable grcups is shown in Table 14 and the 
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cor.positc of anino ac.'.J types occurring at each resid.jo 
is shovn in Table 15. 

BPTI is freeiy soluble and is not known to bind 
metal ions. DPTI has no known cnzynatic activity. 
BPTI binds to trypsin, = 6.0 x 10*^** M CTSCH87). 

BPTI is not toxic. If K15 of BPTI is chanqcd to L, 
there is no ne^isurable binding bctveen the r.utant BPTI 
and trypsin (TSCH37) . 

Stereo Figure 7 sho-'s the alpha carbons of BPTI 
plus the side groups of .conscc/ed residues; all four 
atonis of conserved glycines are shown. All of the 
conserved residues are buried; of the seven fully 
conserved residues only C37 has noticeable exposure. 
The solvent accessibility of each residue in BPTI is 
given in Table 16 wnich was calculated frorv the entry 
"6PTi" in the Brookhaven Protein Data Bank with a 
solvent radius of 1.4 A, the atonic radii given in 
Table 7, and the r.ethcd of I^e and Richards (L;:ED71). 
Each of the 51 ncn-ccnGcrvcd residues can acconr.cdate 
two or more kir.ds of anino acids. By independently 
substituting at each residue only those arr.ir;o acids 
already obiierved ac that residue, wo could obtain 
approxiDAtcly 7 x 10"'^ different anino acid sequences, 
r^ost of which wLl! fold into structures very similar to 
,BPTI. 



BPTI will be useful as a JCPBO for nacrooolccuLcs. 
30 (See Sec. 2.1.1) UPTI and DPTI hotr.olcgues bind tightly 
and with hit;h specificity to a number of enzymes. 

BPTI is strongly positively charged except at vor,* 
high cti, tlius DPTI is us-,cful as rPfiD for targets that 
35 are not also strongly positive under the conditions of 
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inccndod use (see Sec. 2.1.2). There exicc howologues 
of BPTI, however, having quite difforent charges f vi z . 
SCI-III from Bcr.bvx nor i at -7 and che trypsin 
inhibitor from bovine colostrum at -1). Once a 
5 derivative of M13 is found that displays BPTI on its 
surface, the sequence of the BPTI domain can be 
replaced by one of the honologous sequences to produce 
acidic or neutral IPDDs. 

10 BPTI is not an en^yne (See Sec. 2.1.3). BPTI is 

quite sr.all; if this should cause a pharmacological 
problea, two or nore BPTI-derivcd domains n:dy be joined 
as in the human DPTI horr.ologue that has Vjo donains. 

15 A derivative of M13' is the preferred OCV. (See 

Sec. 3) . Wild-type M13 does net confer any resistances 
on infected cells; M13 is a pure parasite, A 
"phagenid" is a hybrid bc^iwecn a phage and a plasnid, 
and is used in this invention. Double-stranded plasraid 

20 DMA isolated fron phager.id-bear ing cells is denoted by 
the standard convention, e . c . pXY24. Phage prepared 
fron these cells would be designated XV24. PhagenidL; 
such as Bluoscript K/S (sold by Stratagene) are not 
suitable for our purposes because Bluescript does not 

2 5 contain the full genor.e of HI 3 and nust be rescued by 
CO infect ion with ccr.potcnt ui id-type M13. Such 
coinfections will likely lead to genetic recombination 
yielding heterogeneous phage .unsuitable for the 
purposes of the present invention. 

30 

It is also well known that plasmids containing the 
ColEl origin of replication can bo greatly amplified if 
protein synthesis is halted in log-phase culture. 
. Protein synthesis can be halted by addition of 
35 chlorarphcnicol or other agents (^^Ar;re2) . 
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The bacteriophage M13 bla 61 (ATCC 37039) is 
derived from wild-type M13 through the insertion of the 
beta lact.\nase gene' (MIN'ESO) . This phage contains 3.13 
r, kb of M13 bla cat l (ATCC 37040) is derived fron 

M13 bla Gl through the additional insertion of the 
chloramphenicol resistance qcne (HINESO); X13 bla cat 1 
contains 0.S8 kb of DtlA. A 1 though - ne ither of these 
variants o;' M13 contains the ColEl origin of 
10 replication, either could bo used js a starting point 
to construct a usnbLe cloning vector for thc2 present 
example. 

The OCV for the current e\j-ple is constructed by 
15 a process illustrated in Figure 8. A brief description 
of all the plasmids and phagenids constructed for this 
Example is found in Table 17. 



For ss oligo-nt site-directed mutagenesis, 
20 multiple crir.crs lead to higher efficiency. Three non- 
mutagcnic primers are used : 



25 



5' (2326) GGC CCC TCT CAC CCT CCC CCT (2352) V wtM13 
3' ccg ccg aga etc cc.^ ccq cc^ 5' olig-24 , 



30 



'5' (^85-;) CCT GCT GGC TCT CAC CGC CCC (4 87 5) 3 ^ vtia: 
• 3' eg.-, cga ccg aga gtc gcg 'scrj 5' olig = 25 , 



and 
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5' (34-51) CCG GTG ACC GTG CGT CTC CCC (3-;3:) V 

3' ggc cac tcq cac cca gag cgc 5' oligr26 



40 Olig?24 is complementary to a segment near the end of 
M13 gene Hi oliq = 25 is corpl er.enta ry to part of 



1^^^ y.l3 gens IV (SCHA73). Oliq^26 is part of Che orio^* gene 

fj-ojn pBRj22 (MANI82, Appendix D) : the numbers shovn 
refer to pBR,T22 base pair nunbers. Note that pLC2 and 
its derivatives carry the anti-sense strand of the ar^*^ 
5 qcne in the + DNA strand. The segments are picked to 
be high in CC content and to divide the pLC7. genone 
■ into several segments of approximately eq\:al length. 

The genetic engineering procedures needed to 
10 construct the OCV are sianaard. All restriction 
digests use comncrcially available enzynes and are 
carried out under conditions recommended by the 
supplier. All restriction fragments •? ONA are 

purified by HPLC or electrophoresis from agarose gels 
15 as described elsewhere in the present invention. 
Conpetent ^ coli are preferably prepared by a modified 
version of tne procedure of Maniatis (^.AinsZ) given in 
the generic detail section. H12 and its engineered 
derivatives are infected into £^ col i strain PE3S-; 
2C (F* . Ric'.Sup'^^.^npS) . piasnid D!;a of >n2 derivatives is 
transfcrried into col i strain PE: 33 D ( f" , Rec" 

. Sup"", Ar.p^) so that we avoid multiple infections that 
night arise once phage arc produced. Isolation of MIj 
phage is by the procedure of Salivar et al . {SALI54); 
25 isolation of r-'.pl icat ive form (RF) MIj is by the 
- procedure of Jazwinski et nl . {JA2W7ja and JAZW7jb). 
Isolation of plasmids containing the Colli origin of 
replication is by the method of l^aniatls (M.^:::a2; . 

30 D:ia sequencing is by the method of Sanger 

(AUSU37) . Virions of M13 derivatives contain circular 
ss DNA that is called the viral + strand. Dase numbers 
are assigned from an agreed origin and in ascending 
order in the 5'-to-3' direction of the viral + strand. 
35 Conventionally, this DNA is dravn with the 5' -to-]' 
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direction clockwise and corresponding to increasing 
base nunber. In relation to the gcnor.es of M13 
derivatives, we will use "up" or "above" to aean higher 
base nur.ber or further along in clockwise direction. 
Siriilarly "down" and "below" will mean lower base 
number or further aionq in the counterclockwise 
direction. To determine the base sequence of part oc 
an .M13 derivative, ono needs a sequencing prir.cr that 
is conpLeneiitary to a region above and within about 100 
bases of the region to be sequenced. Because the OCV 
is constructed fron parts of M13rr.pl8, parts of pBR322 , 
and synthetic DN'A, the sequence of flanking regions is 
always known. 

We pick the arn ^ qene froa p8Rj22 as a convenient 
antibiotic resistance gene. Another resistar.ce gene, 
such as kananycin, could be used. (The New England 
BioLabs 19S3/39 catalogue contains a genetic nap of 
pBR3 22 on page 106.) The plasnid pBP,3 22 also contains 
the ColEl origin of replication. Th£ restric'ion sices 
Acc I at 2246 and Aat II at 4286 are the most 
convenient places to cut p3?.322 to obtain both an 
intact a.~3 ^ gone and the CclEl origin of replication 
with ends suitable for ligation to ocher D.'^A. 

The plas.T.id pQR322 contains a unique AlvN I site 
at base 238 6 that is between the ano ^ gene and cr 1 . 
There is a unique A I vff L sifa in M13nipl3 at base 2137. 
When the Acc I-to-A^t II fragment of pDR322 is ligated 
into M13r.pl3, there will be two AlwM I sices and no 
easy way to excise the ann^ gene. Thus we convert the 
A 1 vfJ I site of pBH322 into an Xba I site that will be 
unique in all the d::a constructs of the present 
ocample. The two oligo-nts: 
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5' ccgiTCTAGActagtcqCCA 
3' CGTagccACATCTaaccagc 



3' 

3' 



Oliq:60 
oliq-.61 



Are synthesized by standard -.ethods and anneal»j<l. The 
AlwN I site at base 2336 in has the sequence 5'- 

CACCCACTC - 3'. Plasnid pBR322 Is cut with AlvN I and 
rr.ixed with the synthetic ds DMA and liq.ited. Cells ire 

IC transforr.ed ^r.d cclecccd for tetracycline resistance. 
Tetracycline resistant colonies are screened for the 
correct insert by restriction digestion with tnat 
cuts the correct . construction but net p2R322 . The 
correct construction is called pLC322. Plaszid pLC:25 

15 differs fron pDR322 only by the replacenent o: tr.e 
Alwr; r site with an 1 site. 

The plasmid pLC32 2 contains a second Ac:: I 
restriction site at base 651 so that digestion -jf 

20 pLC322 with A^t 11 and Acc I yields ■ three fraqmonts. 
cr.e of about 20';l bases fthat we want), cho cf -ibout 
729 bases, and one o: about IdOO bases. To facilii-ite 
isolation of the 204 l-tase tracrt:enc, we al-j'j digest 
pLG322 with Q^v 1 th.^t cuts at base 1369. The Svv I 

2 5 cut reduces the 1600-base franrx-nt to two fracnonts cf 
about 700 and about 9'JO bases each. We purily the 
2041-nt fraginent by K?LC or ag-:\rose gel 
'electrcphores is . 

30 M13r.pl8, sold by Ucw England BioLabs. contains 

neither Aat II nor Acc I 'sitca. Therefore we insert an 
adaptor that allo*..'3 uo to insert the Aat Il-tc-Z^rc I 
fragn-.cnt of pLG3 2 2 that carries the oii-s^ gene and the 
ColEl origin of replication into a desirable place in 
M13npia. Mlj-pl3 contains a l.^cL-zS pron-.cter and a l^cZ 
■gene that are not ur-eful to the purposes of the present 
invention. By cutting :!l3c?13 with Ava :: at the 
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unique site at 5914 and with D3u36 I at the unique site 
at 6503 and discardincj the approx ioa tely 600 
intervening base pairs, we eliminate all recognition 
sitGs of the enzymes shown in Table 18 fron M13npl8. 

b 

M13r.pl8 itself is not cut by the enzymes listed in 
Table 19. Among the enzymes in Tables IS and 19, those 
listed in Table 20 hdve recognition sites within the 
j^cc I-to- A<T t II fragment of pLG322 that contains the 
10 amp ^ gene and the ColEl origin of replication. 

Therefore the following adaptor is synthesized, 

5' GACCCACGTCtgcctcGTATACCCGACCGcatagctCC 3' oligsi 
15 3' CCTGCAGacggagCATATCCCCTGGCqtatcgaGCACT 5' oligif2 
Avail I Aatin lAccrlP-Srll I | B5u36I 
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where the Ava II and Aat II sites share one GC base 
20 pair, and the Acc I and Rsr II sites share a different 
CG pair. The two.33-base oligo-nts are synthesized by 
a standard procedure described elsewhere in the present 
invention; the oligo-nts are annealed to oach other. 
The b<»2es shown in lower case are spacers. In a later 
25 step, we will cut this adaptor with both Aot 11 and 
Acc I; for both enzymes to cut efficiently, there must 
. be at least five bas between the sites. Similarly, 
we will begin the construction of the obd gene by 
inserting DNA at the P:;r II and B<u16 I sites; thus 
30 these sites are separated by seven bases to allow 
simultaneous cuts. 

The annealed nd.int.or is 1 ig.ited with RF M13mpia 
that has been cut with both Ava II and 3 su36 I and 
35 purified by IIPLC or polymery lanide gel electrophoresis 
(PAGE). Cells are transformed with the lig.:itcd DtU. 
DNA from colonies selected on. LB agar with ampicillin 
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is screened by restriction digestion. The desired 
construction can be cut with EST H or I, but not 

by any of the enzymes listed in Table 18. Plasnid DNA 
from colonies that have the predicted restriction 
.digestion is sequenced in the region of the insert to 
verify the construction. This construction retains 
both the Ava II and the Bsu36 I sites. The resulting 
construct is called pLGl. 

The plasriid pLGl is grown by standard techniques 
and 0U\ isolated and cut with both Aat II and Acc I. 
After ligation, there will still be Aat II and Acc I 
restriction sites at the ends of the inserted DSA. The 
Aat II-to-Acc I fragment of pBR322 is lighted to the 
backbone of LCI. The ligated D:/A is used tctransfom 
competent coll that are plated on a.-piciUin- 

containing plates after a short grow-out. 

Anipicillin-resistant colonies are picKed. Plasmid 
DMA of the phagemid fron the resistant colonies are 
digested wich BSU36 I and Psr I. To verify the 
construction, DHA fron phagenids with the correct 
restriction digestion pattern is sequenced: a) fron 
about 20 bases above the Bsu35 I site to about 20 bases 
below the Rsr I site, and b) for about 30 bases either 
-side of the unique Ava II site. The correct construct 
is naned pLC2. 

The Acc I restriction site is no longer needed for 
vector construction. . To eliminate this site, RF pLG2 
dsDNA is cut with ACC I, treated with Klcr.o'-- fragnient 
and dATP and dTTP to nake it blunt and then religatcd. 
The ligated DNA is used to transfonn competent cells; 
after a short grow-out, arpici 1 lin-res istant colonies 
are selected. Restriction digestion is used to screen 
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phagenid OUA from these colonies; the desired product 
cannct be cut with Acc I. To verify the construction, 
DNA from ccionies lacking an Acc I restriction cite Is 
sequenced froa about 20 bases above . the fomer Acc I 
5 site to about 20 bases below it. The cloning vector, 
named pLC: , is now ready for stepwise insertion of the 
osp-jpbd gene. 

We are now ready to design a gene (See Sec. 4) 
10 that will cause QPTI -dona ins to appear on the outer 
surface of an M13 derivative: LG7. 

To obtain a novel protein domain attached to the 
outside of y.lZ , we insert DNA that codes Cor mature 

15 DPTI after A23 of the precoat protein of M13. Mature 
BPTI begins with an arginine residue, which i*=; charged; 
cleavage by signal peptidase I is normal in such cases. 
Signal peptidase I (SP-I) cuts a chimera of M13 coat 
protein and BPTI after A23 leovir.g r:ature BPTI attached 

20 at its carbcxy end to t.he amino teminus of H13 CP. 

The follcwing aninc-acid sequence, called AA_3eq2, 
is constructed, by inserting the sequence for mature 
BE>TI (shown underscored) inncdiately after the signal 
25 sequence of M13 precoat protein (indicated by the 
arrow) and before the sequence for the M13 CP. 
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AA__seq2 

1 1 2||2 3 3 4 4 5 

5OS0U5 0 5050 
5 M./u'gruTVAc:vAVATrvPMr/;FJ;PPnFCt.E:PPYTCFCtO\RIinYrYNAKA 



566778 399 10 

505050 5050 
10 GLC0TrVYCCCRAKR^MFK5AFPC"?-TCC-GA .\ECDDPAKAArNSLQASAT 



10 11 11 12 12 13 
5 0 5 0 5 0 
15 EYIGVAWAMW^/IVCATICIKLFXKrTSKA.S 



We adopt the convention that sequence nur.bers of 
20 fusion proteins refer to the fusion, as coded, unless 
otherwise noted. Thus the alanine that begins M15 CP 
is referred to as "nurr.bor 32", "nur.ter 1 of Mi: CP", or 
"number 59 of the nature 3PTI-M13 CP fusion". 

25 The osc-inbd ' gene is requldted by the l^cuv: 

promoter, so that the level of expression can be 
regulated, by the concentration of IPTC supplied in the 
growth rr.odiun. (See Sec. 4.1). The host strain of 
col i should harbor tha \ y.cl^ gene that represses the 

30 lacUV5 promoter to a greater extent than liL'il*- Tnc- 
csp-iobd gene is ended by the t rp attenuator so that 
RWA polynerase will not read through into subsequent 
genes. The oso-iobd gene is expressed and processed in 
parallel with the wild-type qcne v T 1 1 ■ The novel 

35 protoin, that consists of B?TI tethered to a M13 CP 
domain, constitutes only a traction of the coat. 
A.ffinity separation is able to separate phage carrying 
only five or six copies of a r^oiecule that has high 
affinity for an affinity m^»trix (GMITES) ; U 

40 incorporation of the chimeric protein results in about 
30 copies of the protein ex'posed on the surface. If 
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this is insufficient, additional copies may be 
provided. 

Figure 9 shows, in stereo, a hypothetical model of 
5 a short segment of the coat of a derivative of M13 in 
which soir.e coat protein ncnor.ers are fusions of nature 
BPTI to the anino terninus of the ncrr.al Mi: CP. The 
figure shows only protein C^i^pt^jS; the DNA, not shown, 
lies inside the cylinder. The r.cdol of H13 coat is 

10 after the r.odel for fl of Mar^/in and colLeaques 
(BAiJNSl). The BPTI domain is taken from the Brookhaven 
Protein Data Bank entry "6^1" and was attached by 
standard model building ncthcds that • insure that 
covaient bond lengths and angles are close to 

15 acceptable values. The space bet-oen the alpha helical 
main chains is filled by protein side groups so that 
the DNA is protected fron solvent. The figure is not 
neant to suggest that BPTI fused to M13 CP will adopt 
the conformation shown, which is arbitrary. .^^ather the 

20 model shews that the fusion protein could fit into the 
sup r a;r.o lecu la r structure in a sterccche.-nical ly 
acceptable fashion without disturbing the internal 
structure of either the M13 CP or liPTI domain. 

25 ^ The osn- irbd gene will use; a) the lac UV5 

promoter, b) a Shine-Oa 1 ga rno sequence having high 
hornology to natural Shine-Ca Igarno sequences, c) a 
completely synthetic coding region having codons 
assigned to optimize placen-.ent of restriction sites, 

30 and d) the trp attenuator as transcriptional 
terminator. (See Sees, A.l and 4.2). 

The ambiguous DN'A scquenc*; coding for AA_seq2, 
shown in Table 3, is examined by PPOSPECT for places 
35 where recognition sites for any of the enzymes listed 
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in Table 2L could be created without riltering the 
amino-acid sequence. (See Sec. 4.3). A master table 
of enzymes is* conpilcd from the cataLoques of enzy.^e 
suppliers lifted in Table 4. The enzymes listed in 
Table 21 are those that do not cut ' the OCV, the 
construction of which is described above. The codes 
used in the ar.biguous DNA are shewn in Table I. 



Using the procedure given in Sec. 4.3, we design a 
10 Lpbd gene, such as that shown ■ in Table 2? and in Table 
2'J . The recognition sequences of commercially 

available enzymes that recognize five or =»ore bases are 
shown in Table -i . Some of these enzyme?^ f e -c. 3a n I or 
Hph I) cut the OCV too often to be of value. A sur..T.ary 
15 of restriction sites in rhe designed rbd gene are given 
in Table 24 . 



The entire DNA sequence of th2 r. 12c:>""b:)*: i fusion 
•■;ith annotation appears in Te»blc 25 showing the useful 
restriction sites and biologically ir.portant features, 
•/ i z . the ji) Z promoter, the 1 acO operator, the Shlne- 
Dalgarno sequc-nce, the amino acid sequence, '-he stop 
codons, and the .transcriptional tenr.inator. 




25 - The tr>bd gene is synthesized in several steps 

using the method described in See. f-.l. generating cs 
C:iA fragments of 150 to 190 base pairs. In this 
example, the 3' overlap window (My) is set to run from 
2 3 to 27 which is generous. The end spacers (Ug) that 

30 are added to insure efficient digestion are set to 6, 
which is also generous. Syntheses designed with 
sxaller overlaps and shorter spacers would allow longer 
fragments of dsOUK to bo synthesized and consume less 
of the reagents. Note, however, that Oliphant and 

35 Struhl (OLrP37j required largo excesses of restriction 
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nhcir dsnN'A; this 
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could have been because they had set 

AU DNA synthesis and purification is done by 
standard nethods as descrited in Sec. 5.2. 

The four steps (See Sec. 6.1) by which we clone 
synthetic fragments of the nUcp-bct; gene (the osoj: 
jpbd gene of the prcsonc exanple) into pLG3 and its 
derivatives are illustrated in Figure 20. . 
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The sequence to be introduced into pLG3 is shown 
in Table 26 and in Table 2'. The segment is 158 bases 
long .and is synthesized from two shorter synthetic 
oiigo-nts as described in Sec. 5.1 of the generic 
specification. The iDport::r.t features of this segment 
are five restriction sites, the lacU\'5 promoter, a 
Shine-Dalgarno site, and tihe TrpA attenuator as shown 
in Table 26. 

Table 27 repeats the anti-sense strand shown in 
Table 26. The 99 bcise fr.^cr.ent shown in upper case 
letters and underscored ( 5 ' -CCGTCC. . .CCnTCG-3 ' ^ 
olig»3) . is synthesized in the standard manner. 
Similarly, the 100 base Icr.g fragment of the sense 
strand shown in lower c?,'j>^ ( 5 ' -cgctca . . . . aa ttg-3 ' = 
^oligi-:) is synthesized. After annealing, the double- 
stranded region is extended with Klencw fragment by the 
procedure given above to -a>:e the entire 176 bases 
double stranded. The overlap region is 23 base pairs 
long and contains. 14 CC p:\ir<i and 0 AT pairs. The Df^A 
between Avr II and .Ar.u II d.-.es not code for anything in 
the final £bd gene; it is t,^ore so that the DNA can be 
cut by both Avx II and A:--: 11 at the same tire in the 
next step. This spacer was nade rich in C and C so 
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that anncaling -of the two s Lnqlc-stranded DNA fraqr.ents 
will be efficient. Eight bases have been added to the 
left of Esx II and nine bases have been added to the- 
left of 5ivu "I (same- specificity and cutting pattern as 
BSU36 I). These bases at the ends are not part of the 
final product; they nuct be present so. that the 
restriction enzymes can bind and cut the synthetic ONA 
to produce specific sticky ends. 

The synthetic D::a is cut with both riau I and ' 
Rsr ir and purified by HPLC or PAGE. RF pLG3 is cut 
with Sau I and ^ 11 and purified by KPLC or agarose 
gel electrophoresis and . electroe lu t ion . The large 
piece froni the phagemid and the synthetic DWA are 
ligated and used to transform coJa^ AnpiciUin- 

resistant colonics are obtained and plasnids are 
screened by endonuclease digestion of RF phagenid DNA, 
The desired product can be cut by Avr II. Asu II, or 
BstE II, but the original phagenid car. not be cut by 
«ny of -hese enz^-nes. To verify the. insert; DtiA fron 
isolates that have the correct restriction sites i^ 
sequenced from about 10 bases above the Sau I site zo 
about 10 bases bolcv the Rsr II site. The construct 
with the correct insert is called pLG^ . 

The second step of the construction of the PCV is 
Illustrated in Tables 28 a.nd 29. This second seg:=er-t 
of DIIA is 155 bases long. As in the construction of 
pLG4, two pieces of single-stranded DNA are 
synthesized. A 99 base long fragment of the anti-sense 

sr.rand (5'-CCACCA CGTCCG-3' = olig?5) is shown in 

upper case letters and underscored; the ether piece of 

99 bases (5'-gatcta atcacct-3' = olig^S) is shown 

in lower case and is a frag-cnt of the sense strand. 
These strands are oonp lc.-ne n tary over 24 bases. 
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containinq 14 CC fcat;e pairs and 10 AT base pairs. 
Klenov fragment is us.id to eKCend in both directions to 
produce els DNA. Doth the synthetic dsDNA and RF pLC4 
0»A are cut with both Avr II and Asm I^ and purified by 
5 HPLC or the apprcpri.ite type of gel electrophoresis. 
The backbone fron the phagenid pLC4 and the synthetic 
ONA are ligated and used to transform co\ i . 

Ampiciil in-rcsistant colonies arc obtained and plasr\ids 
are screened by restriction digestion. The desired 

10 product can be cut by any of Af I II, t.'he I, NNru I, Hxin 
I. Acc III. Ava I, Xho I, Pf IM I, A^a I, Dra II, Pss I, 
or BssH I vhile pLC4 can not be cut by any of these 
enzymes. To verify the insert, DNA fron phagemids uith 
the correct restriction sites is sequenced from abouc 

15 10 bnses above the BstF: II site to about. 10 bases belcv 
the Avr It site. The construct carrying this second 
insert is called pLC5. 

Construction of pLG6 proceeds similarly to the 

20 construction cf pLCS. The sequences are shown in 
Tables 30 and 31. The tvo single stranded segments 
(oligs? and oli-.7 = a) ^re synthesized, annealed, <ir.d 
extended with Klenov fragr.ent. The overlap region 
cor.prises 25 base pairs, 15 CG and 10 AT. Both the 

25 synthetic DllA and R? pLCS are cut vith both Bs::H I and 
Asu ir, purified, and the appropriate pieces arc 
iigated and used to transform col i . Anpicillin- 

resistant colonies are obtained ar.d plasmids are 
screened by restriction digestion. The desired 

3 0 phagor.id can be cut with any of Scu I, Acc I, Xca I, 
Esq I. III, I, Ebe 1. or Jiar I, while pLG5 can 

not be cut by iiny of Cheso enzymes. To verify the 
third insert, D'.'A from phagemids vith the correcc 
restriction map is sequenced from about 10 bases above 

35 Asu ri site to about 10 bases belov the DssH I site. 
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The construcc with the ccrrcct third insert is called 

The construction ot pLC7 is illustrated in Tables 
32 and 33 and prococdG si.r.ilarly to the constructions 
of pLC4 , pLG5, and pLCo. The two single stranded 
segments (olig=9 and olig^lO) are synthesized, 
annealed, <^nd extended with Klenow fraqn;ent, *Ooth the 
synthetic DN'A and ?.F pLCo are cut with both Dbe I and 
Asu ir, purified. and the appropriate pieces are 
ligated and used to rransfcrm co I i , An:picillin- 

resistant colonies are screened by restriction 
digestion of phage.-nid R" D.VA. The desired phageinid can 
be cut with any c: S f i r, -iind III, ^au 1. astX I, or 
Vco I, while pLCo can be cjt by none of these enzyr.es. 
To verify the fcurth insert, DNA fron pnagemids with 
the correct restriction sites is seq'jenced fron about 
10 bases above the A.'^'j II site to about 10 bases below 
the 5he T site. The construct vith the correct fourth 
insert is called pLCT; the display of BPTI on ths cuter 
surface of LC7 is verified by the methods of Sec. B. 

ML3ari-;29 is an ar.bor -utation of Ml 3 used to 
reduce non-specific binding by the affinity .T.atrix tor 
phages derived fron ::i3. M13.ir;-129 is derived by 
standard genetic r.cthcds (:::lL72} fron vtMij. MlJ.ini-i29 
'is grown on col -1 strain PZZSai?'*', ZupZ, ?.ec", A.-r.r-) 
and harvested by the st^nd.^rd -ethod. 

Phage LG7 is grown on ^ col I strain PE364 in LH 
broth with various concentrations of IPTC added to the 
mediun to induce the oso* 1 :;bd gene. Phage LG7 is 
obtained from cells grown with 0.0, 0.1, l.o, 10.0 or 
100.0 u.M, or l.C .T..". IPTC, han^cstcd (.See Sec. 7) by the 
method of Salivar (JALIC';), a.nd concentrated to obtain 
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The preferred rethod of determining --hocher u:^ 
displays sm on its surface (See Sec. 0) is to 
determine whether those phage can retain a labeleJ 
derivative of trypsin (trp) or anhydrotrypsin (AHTrp) 
on a filter that allc-s passage of un!>ound trp or 
AHXrp. Trypsin contains 10 tyrosine resi>iues and can 
be iodinated with ^-^l by standard .methods.- we denote 
the lai>eled trypsin as "trp."- Labeled a-.ydrotryps.n 
is denoted as "AHTrc-. Other types of labels can he 
used on trp or AHTrp. e^ biotin or a fluorescent 
label. AHTrp. or trp- is labeled to an activity of O.J 
uci/ug. A sar,nle o: lO^^ 037(10 m.M IPTCi is ni.cd v.th 
1 0 uq of trp* or AHTrp' in 1 - 0 n,l of a hutfer of 10 r.^ 
KCl adjusted to pH S.O vith 1 =.M KjHPO, / KfjPO,. The 
nixture is. passed through an Ainicon M3?l systea fitted 
with a r.e=brane filter that allows passage =f proteins 
sn,aller that = ^OO.OOO. Filters .r. soaked i.". 

buffer containing trp or AHTrp prior to the analysis. 
The filter is washed tvice with 0.5 nl o.^ buffer 
containing trp or AHTrp. The radioactivity retained on 
the filf.r is quantitated with a scintillation counter 
or other suitable device. If e..ch virion .lisolavs cr.e 
copy of BFTI, then .05 ug of protein can ^e bound t^.a. 
wouLd give rise to 3 x lo" dis intcgraticns / ni.-.ute o.n 
the filter. 

An alternative --ay to quantitate display of 3?TI 
on the surface of LG7 is to use the stoichiometric 
binding between trypsin and DPTI.to titrate the BrT.. 
A solution that titers id^ ptu/ml of a phage is 
approvinatoly 1.6 10-8 „ in phage if each virion is 
infective. The ratio of pfu to total phaoe can « 
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determined spcct rophotoncC^ic^l ly usino the nolar 
extinction coefficients at 260 nm and 280 nni corrected 
for the increased length of LG7 as compared to wtM13. 
For example, if a l.O ml solution that contnins 10^2 
pfu of LC7 phage grown with 1.0 niM I?TC inhibits 
trypsin solutions up to 4 . 3 x 10"^ we calculate that 
there are opproxiaatcly 30 DPTTs/CP (j^ {'..8 x lO'^ 
molecules of DPTI/1)/(1;6 x 10'^ phage/1)). -Inhibition 
of a specified concentration of trypsin is most easily 
measured spcctrophotoactr ica 1 ly using i pcptide-1 inked 
dye, such as Naipha'benzoy 1-Arg-Nan (TSCHT). 
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Alternatively, binding to an affinity column nay 
be used to demonstrate the presence of B?TI on the 
surface of phage UZl . An affinity colur.n o: 2.0 □! 
total volume having BioRad Affi-Cel 10("^) r_atri^ and 
30 ng of AHTrp as affinity material is prepared by the 
method of DioRad. The void volume (Vv) of this cclur.n 
is, by hypothesis,. 1.0 ml. This affinity column is 
denoted (AHTrp). 

A sar.?le of 10" M13£ir:429 is applied t3 {AHTrpl in 
1.0 nil cf 10 irJt i'Xl buffered to pH S.O with XH^PO; / 
K2HPO4. The column is then washed with the sar.e buffer 
until the optical density at 280 nm of the effluent 
-returns to base line or 4 x Vy have been passed through 
the column, whichever comes first. Samples of LG7 or 
LCIO are then applied to the blocked {AHTrp) column at 
1C12 pfu/ml in 1,0 ml of the same buffer. The colur-n 
is then washed again with the same buffer until the 
optical density at 2'30 nm of the effluent returns to 
base line or 4 x have been passed through, whichever 
cones first. Following this wash, a gradient of KCl 
from 10 rJ^l to' 2 M in 3 x Vv, buffered to pH 8.0 with 
phosphate is passed over the column. The first KCl 
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gradient is folloved by a KCl qradienc running iron Z H 
to 5 M in 3 X Vy. The cccor.d KCL gradient is fol loved 
by a gradient of guanidinium Ci Crora 0.0 H to 2.0 M in 
2 X Vw in 5 H KCl and buffered to pH 3.0 with 
5 phosphate. Fractions of 50 ul are collected and 
assayed for phage by plating ul of each fraction at 
suitable dilutions on sensitive cells. Retention of 
phage on the colur.n is indicated by appearance of LG7 
phage in fractions that clute significantly later froa 

10 the colunn than control phage LCIO or utMl3. A 
successful isolate of ICl that displays BPTI is 
idtintified, the bo t i insert and junctions are 
sequenced, and this isolate is used for further vork 
described beiov. It is likely that a significant 

15 fraction of clonal isolc-.es fron the sane ligation that 
are characterized an identical by restriction digestion 
will sinilarly display BPTI. 

If vqDNA is used to obtain a functional fusion 

20 between a EFT! cutant and M13 C? f v ide i nfra ) . then DNa 
from a clonal isolate iu sequenced in the regions that 
were variegated. Then gratuitous restriction sites for 
useful restriction onzyr:cs are rer.oved if possible by 
silent ccdon changes as follows. A de novo piece of 

25 synthetic DMA is synthesized such that the selected 
amino acid sequence is preserved and clcned into pLC7. 
The sequence nur.iors of residues in OSP-IPBD will be 
changed by any insertions; hereinafter, we will, 
however, denote residues inserted after residue 23 as 

30 2:a, 23b, etc . Insertions after residue 81 will be 
denoted as 81a, 31b, etc . This preserves the numbering 
of residues between C5 in BPTI and CSS in BPTI. 
Residue C5 of BPTI is always denoted as 28 in the 
fusion; residue CSS of BPTI is always denoted as 78 in 

35 the fusion, and the intervening residues have constant 
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nuDbers. 

Should' LG7 phage fro.-n cells qroun with 10 tnM IPTG 
fail to display BPTI on its surface, we have several 
options. We might try to determine why the 
construction failed to ---ory: as expected. there are 
various possible nodes of failure, including : a) BPTI 
is not cleaved from the 1113 r.lgnal sequence, b) B^I is 
cleaved froa the MU-CP. and c) the chimeric protein is 
aade and cleaved , a fter the signal sequence, but the 
processed protein is not incorporated into the M13 
coat. BPTI has been secreted from coU (KARK86) ; 
however the M13 coat-protein signal sequence was not 
used. Therefore problons stcnrning fron the signal 
sequence are unUKely, but possible. We could 
determine whether BPTI was present in the periplasm or 
bound to the inner ner.brane of LG7-infected cells by 
assays usi.-g Labeled -tr-ypsin or anhydrotryps in . 

Proteins in the periplasa can be freed through 
spheroplast formati'cn using lyso.-:yTne arc: EPTA in a 
concentrated sucrose solution l BIRDS? .-\L.VC4 ) . If 
BPTI were free in the p-ripias.-. it would be found in 
the supernatant.. Trypsin labeled with ^-'^l would be 
nixed with supernatant and passed over a non- 
denaturing nolecular sizing colunn and the radioactive 
-fractions collected. The radioactive fractions would 
then be analyzed by SDG-PACE and examined for BPTI- 
sized bands by silver staining. 

Spheroplast forr.ation exposes proteins anchored in 
the inner n:on;brane. Spheroplasts would be nixed with 
AHTrp* and then either filtered or centrifuged to. 
scpar.ite then from unbound AHTrp-. After washing with 
hypertonic buffer, the spheroplasts would be analyzed 
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for extenc of AKTrp* binding. 

If BPTI were found free in the periplasa, then we 
would expect that the chimeric protein was being 
cleaved both between BPTI and the M13 nature coat 
sequence and between BPTI and the signal sequence. In 
that case, we should alter the BPTI/M13 CP junction by 
inserting vgDNA at codons for residues 73-82 of 
AA_seq2. 

If BPTI were found attached to the inner nesibrane, 
then two hypotheses can be formed. The first is that 
the chineric protein is being cut after Che signal 
sequence, hut is not being incorporated into LC7 
virion; the treatment would also be to insert vgDSA 
between residues 78 and 82 of AA_seq2 . The alternative 
hypothesis is that BPTI could fold and react with 
trypsin even if signal sequence is not cleaved. N- 
terminal aniro acid sequencing of tryps in-binding 
material i::olated from cell homogenate detemines what 
processing is occurring. If signal sequence were being 
cleaved, we would use the procedure above to vary 
residues between C78 and A82; subsequent passes would 
aJc r'--iu.;es a fter . residue 81. Tf signal sec\;ence were 
,not beinc, cleaved, we would vary residues between 23 
and 27 of AA_seq2. Subsequent passes through that 
process would add residues after 23. 

If BPTI were found neither in the periplas.-n nor on 
the inner membrane, then we would expect that the fault 
was in the signal sequence or the s igna 1-sequence-to- 
BPTI junction. The trcitment in this case would be to 
vary residues between 2 2 and 27. 
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wrong take time and oCfort and, for the foreseen 

outcomes, indicate variations in only two regions. 

Therefore, we believe it prudent to try the synthetic 

evperincnts described below without doing the analysis. 

Tor example, these six experiments that introduce 

variegation into the hot i-oone VTTI fusion could be 
trie 



10 



I) 3 variegated codons between residues 78 and S2 
using oligffl2 and olig03. 



2) 3 variegated codons between residues 23 and 27 
using oligi:14 and oligilS, 

15 3) S variegated codons between residues 78 and 82 

using oLig;i3 and oiiq^l2a. 

4) 5 variegated codons between residues 23 and 27 
ui^ing oligjlS and olig.^l4a, 

20 

5) 7 variegated codons between residues 78 and 82 
using oiig?13 and olig^llb, and 



25 



6) 7 variegated codons between residues 23 and 27 
using oliglfiS and oiigj*l4b. 



To alter the DPTI-H13 CP junction, we introduce 
ONA variegated at codons for residues between 78 and 82 
into the ^Dh I and .S f i I sites of pLG7. The residues 

30 acter the last cysteine are highly variable in amino 
acid sequences honolcgous to DPTI, both in composition 
and Icnqth; in Table 25 these residues arc denoted as 
C79, C30, and A81. The first part of the M13 CP is 
denoted as A32, E83, and 084. One of the oligo-nts 

35 oligsl2, oIigsl2a, or o.l.ign2b and the priner olig?13 
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are synthesized . by scandard r.echods. The oligo-nts 



residue ' 75 76 77 73 79 80. 81 32 83 
5 ' gc I gag i cCC i ATC I CCT I ACC I TCC I q f H I q f >c [ q f )c I CCT | CAA [ - 

84 35' 36 87 83 RO 90 91 
.CCT;CATlCA7|CCC|CCCjA.^uS.|CCCiCCCiqcgicc 3' olign2 

residue 75 76 77 73 70 SO 81 013 31b 
5' gcigagicCC; ATG|CGT| ACClTCClqiX.IqfJctqCJclqfKlqfJtl - 

82 83 84 85 86 r,l 
CCT|GAA|GGTlGAT|GAT!CCCj - 

88 89 90 91 
CCC| A..\i\ I GCGiCCC I gcg I cc 3' olig = l2a 

'GSidu^ 75 76 77 78 7? 80 81 3Lo 61b 
5' gcigaglcGC!ATG|CGTlACC|TCClqfI^.!qC>:!qfk|qfk|qfic| - 

21C 31d 22 33 8-: -3 5 SG 87 
qty.\qrK\ CCT 1 CAA | CCT ' CAT \ CAT I CCC j - 

• 33 89 00 91 
CCCj AAAiCCC'CCCigcglcc 3' oUg-12b 



esidue' 01 90 SO 8-3 37 36 
gg;cgc|CCClCCClTTTiCCC|CCCiATC 3' olifj:!: 



45 



vherc q is a r:ixture of (0.26 T, 0.18C, 0.20 A, and 
0.30 G), f is a ni:<ture of (0.22 T, 0.16 C, 0.-;0 A, nnd 
0.22 G) , and k is Q nixture of equal parts cf T and G. 
The bases shown in lover case at either end arc spaccrn 
and are not incorporated into the cloned qenie. The 
;.ri-er is co.-pleaenta ry to th% 3' end ot each of tho 
longer oligo-nts. One of the variegated oligo-nts and 
the prir.or clig=13 are cor.bincd in equinclar ar.oun'-.s 
and annealed. The dcO.'iA is co.'::pleted with all four 
(nt)TPs and Kler.o^ frag.7.ont. The resulting dsDNA and 
pLC7 arc cut --ith both Gfj. I and S£m 1. purified, 
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nixed, and lioated. This ligacion nixturc goes through 
the process described in Sec. 13 in vhich select j 
transformed clone that, when induced ui^h rPTC, binds 
AHTrp. 

To vary the junction between M13 signal saq^jence 
^nd BPTT, -^o introduce Otlk variegated at cocions for 
residues bcf-een 2 3 and 27 into tne .Kon I and Xho I 
sites of pLC7. The first three residues are :;lghly 
variable in anino acid sequences hcaoloqous to DPTI. 
Kcraologous sequences also vary in length at the anino 
the oligo-nts olig = 14, clig?l-:a, or 
prir.cr olig = lS •:ire symhcsized by 
The oligo-nts are: 



te rrn inus 
ol ig=14b 
standard 



Ore of 
and the 
methods . 



residue : 17 13 19 20 21 22 23 24 25 

5' g|gcc|gcC|CT.\|CCClATC[CTClT'rr!TTT;cc:iqrr:jq:V.j- 

26 27 23 29 30 
!qfk|TTC;TGT!CTC:CAC|cgc|ccq!cqa 3' oltc^i^ 



-'^sidue 17 13 19 20 21 22 2 3 2 ^ :.5 26 
'i'gigcclgcClGTAiCCCliATGiCTGlTCTjTTTiGCTlqfXiqlKlcfkj- 

26a 26b 27 28 29 20 
lqf>:|qfk|TTC!TGr|CTClCACicqv-:iccq;c:Tai 3' ol:g--l^a, 



residue 17 13 19 20 21 22 23 ?.4 >.l 2S 
5'g:gcc|gcC;CT;.If:CGlATCiCTC:TCT!TTT:CCT;qfkiq:y.:qrKi- 

26a 26b 26c 2ed 27 23 29 30 
iqfV:|qfk|qfH|qf:-:!TTC|TGTiCTC;CAG!cqc = ccgjcgaj : 'olign 



5' I tcg|cgglgcg|CTC|GAG!ACA!C^^! 3* clig-lS 

uhei-e q is a nixturo of (0.26 T, 0.13 C, 0.26 A, and 
0.30 G). f is a aixturc of (0.22 T. 0.16 C, 0.40 A, and 






O 



10 



:o5 

0.22 C) , and iz & mixture of equal parts of T and G. 
The bases shown in lowor case at either end am spacers 
and are not incorporated into the cloned qene. One of 
the variegated oliqo-nts and the priraer are combined in 
equiaolar arriounts and annealed. The ds Dr/A is 
completed with all four (nt)TPs and'Klcnow fragment. 
The resulting dsD:/A and RF pLG7 are cut with both Kpn I 
and XhQ I, purified, nixed, and ligated. This ligation 
nixture goes through the process described in Sec. 15 
in which we select a transfomed clone, that, when 
induced with IPTC, binds AKTrp or trp. 

Other nucbers of variegated codons could be used. . 



15 If none of these approaches produces a working 

chiiseric protein, we nay try a different sicnal 
sequence. If that doesn't work, we nay try a different 
OS? in M13 because the structural data clearly indicate 
that BPTI could net be joined .to the carboxy teminus. 

20 The next best OS? of :<13 is the gene III protein 
because there is fus-ion data (SMITSS, C.RU288) . 



Exarole \. P,-\vt 11 

25 

BPTI binds very tightly to trypsin 
= 6.0 X 10" -"^ M) and to anhydrot rypsin , so th-r-t 
these nolcculcs arc not preferred for cptinizing the 
anount of BPTI to display on LG7 or the a.-nount of 
30 affinity nolecule to attach to the column. Tschesche 
et al . reported on the binding of several DPTI 
derivatives to various proteases: 
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DisGOciaticr. constants Tor riPTI derivatives, ^'.clar. 



Residue 
IH5 



lysine 
glycine 
alanine 
va I ine 
leucine 



Trypsin Cliynocryps in 
(bovine (tiovir.c 
pancreas) pancreas) 

6.0 X 10"^"' 9.0 X 10'^ 



Elascasc 
(porcine 
pancreas) 



Elastase 
(hucan 
leukocytes) 



2.8 X 10"** 
5.7 X 10"3 

1.9 X 10"^ 



5 X 10 



-6 



0 X 10" 



5 X 10 



1 xi; 



-10 



9 X 10" 



From the report of Tscheschc et al . ve infer that 
molecular pairs narked have K^s greater than 

3,5 X 10"^ M and that r.olecular pairs narJccd have 
much greater than 3.5 x 10*^ K. Because of the 
wealth of data about the binding of npTI and vario-J.s 
mutants to trypsin and oth^r proteases (TSCH87) , we can 
proceed in various ways. (Tor other PSDs ve can cbt.iin 
two different monoclonal antibodies, one with a high 
affinity having .of order 10"^* M, and cnu with a 
moderate affinity having Kj en the oraer of 10"? V . } 
In this example, wo r.av use: c.^ the -ode rate binding 
between BPTI and hu-an lcuV:ccyre elastase (HuI.El). b) 
the moderately strong biniing of porcine elastase to 
BPTI(V15), or c) the binding of BPTr(^15) (reside* 2d 
in the Ebd gene) tor trypsin Cwea)c but cetecteblel or 
for porcine pancreatic elastase. 

Following the teachings of Sec. 10, ve co-para the 
retention of LC7 virions to the retention of wild-type 
M13 on (AHTrpl. M13 derivatives having more CI-'A than 
wild-type M13 have cor rospcnd ing longer virions. Thus 
we will create pl-CS tliat differs fron pLC7 only in 
having stop codons at coaons 2 and 3, and an altered L 
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codon at ccdon 7 of cr:c c;;r>-'- ^d gene. Phaqe LCS will 
have exactiiy as tnuch ONA as L07 ; therefore the LCS 
virion is exactly as long as the LG7 virion. LG8 can 
not, however, display BPTI on its surface. To generate 
5 these mutations ve synthesize the oiigo-nt 

5 ' (12 1)1 aac I get i age | ett | Caq | aac j cag | aga j ttA 1 ctA 1 cat I - 
10 (agtigaqlcctl (SO) 3' oliq=ll 

that is conplcnentary to bases 80 through 121 of the 
jpbd gene, shown in Table 23, excep- for the three 

15 upper-ease, underscored bases. Olig = H and the 
primers olig.^24, olig = 25, and olig:?6 are annealed to 
circular ssONA from LG7 . .Kienou frag:7.cnt (from US 
Biochemical) and all four (nt)TPs arc usod to complete 
the circular dsDr^A. After treatr.ent vich Kienow 

20 fragnent, the dsDNA is treated with ligase. .Cells are 
transformed with the ligatcd dsCr.'A and. after a short 
growout, the cells are plated on amp ic i L 1 in-conta ining 
L& agar. By changing tl:e third base in codon 1, -e 
have destroyed the u::ique ATL H site in pLCT. Thus -.e 

25 can screen colonies for loss cf the AH 11 site. To 
ccnfirm the construction, Cr.'A from plaques with no 
Afi II site are sequenced from about base :-;0 to about 
base 40 of the c so-icbd gc-.o. 

30 To cxpc.iite identification of different Mi:- 

derivod phage, we replace the aro^ gene o: LC3 with the 
tet^ gene from pCR322. Plasmid pBft3:2 is cut with 
Dsn I at the unique site at 13 53 and the linearized r::A 
is blunted with Klcncw fragment and purified. The 

35 blunt DNA is cut with AiVt II a.-id the l^:3-base tet^- 
boaring fragnent purified by aaarcse gel 
electrophoresis or UPLc' Plasmid pLG8 ds DNA is cut 
with Xbi I at the unique site and the linearized C;:a is 



blunCGd vith Klcnov traqment c\r.d purified. The lir.c.TV, 

blunt D!.'A is digested vith A.t t 11 and trie 7.3 

fragment is isolated. The two isolated DNA frag-cncs £ 

are mixed, annealed, lighted, and used to transferal "-^^ 

5 competent col i cells'. The trans fomed cells are l'-^'^ 

selected with tetracycline. The correct construction 

contains Sol I. KcoR r, and ETcoR V sites. but LCS Vy'-g 

contains none of these. The correct construct icn. i _ 

t ■ ■ -W* 

having 9;2 kb, is easily distinguished Crori piiR::: and c;l 
10 is called LCIO. DNA from phage LXIO is sequenced in fei^^:^ 
the vicinity of the junctions of the newly inserted 
tet ^ gene to confir.-n the construction. 

The phage LOT is gro'-n at various levels o: InC 
15 in the r.ediun and harvested in the w.iy previously 
described. An affinity coiunn having bed vclur.e oz 2.0 
nl and supporting an ar.ount of HuLEl pic^:ed fron t.^e h-r^''P. 
range C.l ng to 3 0.0 ng on 1 n\l of BicP.jd A::i- T;', 
Gel loC^-J or Affi-Gcl isi'^-^) is designated {HuL^li. 
20 An appropriate set of den.<;ities of ituLCl on .ne .':tl'--n 

is (0.1 .-g/ml, 0.5 -g/nl, 2.0 -g/nl, 3.0 .-^C/^l, 13.3 ^ 
n\g/nl, and 3 0.0 mg/nl). T^.e V,; of jHuLEI; iz, by 
hypothesis, 1.0 nl. The elution of LOT phage is ^ 
compared* to the eluticn of LCIO on iliuLEll navir.g y: 
2 5 varying arr.ounts -of iiuLEl affixed. The coUirms are : 
eluted in a standard vay: •;• 

t 

1) 10 mM KCl buffered to pH 3.0 with phosphate, ^ 
until optical dcn?;ity at 2S0n- falls to base li.-.e f = 

3C or 4 X Vv, whichever is first, 

2) a gradient of 10 nM to 2 M KCl in J w Vt;, pH f:' V''^ 
held at S.O with phosphate. 



35 
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2) a gradient of 2 .M co 5 M KCl in 3 x V*/, [i/i!^ 
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15 



20 



25 



30 



phosphate buffer to pH 2.0, 

4) constant 5 K KCl plus 0 to 0.8 M guanidiniua Cl 
•in 2 X Vy, vith phosphate buffer to ?H 8.0. 

The preferred level of induction C I PTGopti^^ia i ) and 
amount of affinity nolecule on the matrix 
(DoAMoHcp::ir.al) ^^'^^^^ settings that, give the 

sharpest'' LG7 oluticn peak t.^at shovs significant 
retardation as co.-parcd to LG3,. wnich carries no BPTI. 
By hypothesis, the best separation occurs for the 
amount of BPTI/GP produced when the cells are induced 
with 10.0 uM IPTC and '^hen 4.0 nq HuL£l/ni is applied 
to BicRad Affi-Gel loC^-^K 

When the arxunt of EPTI/C? and the amount cf 
HuLEl/vol'or-.e of support have teen optir-ized, we turn to 
optimization of elution rate, initial ionic strength, 
and the a.-ount of GP/(volu=e of support) 
paranietars can be optinized separately. 



T.-.ese 



Usi.-q cptinal EPTI/CP ar.d HuL£i/vola.-:ie of support, 
we measure the elution voluze of LG7 and LGO for 
differer.t elution ratos, vi^. 1, 1/2* l/^» i/3 ^/^^ 
times the caxinum flow rate. Mi3 is shear resistant, 
so that the pressure that can be appliod across the 
column is limited only by the -ecnonical properties of 
the support material. By hypothesis. 1/^ of maximum 
elution rate is better than 1/2. but 1/3 is about the 
same as 1/4. Therefore 1/4 r.axizium elution rate will 
be used. 




35 



Elution volu-es of LGl- obtained frcn cells grown 
on media that is 2.0 r-M in IPTG are neasured at optimal 
DoAi^oM and eluticn rate for leadings of 10^, 10^*^, 
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10^^, and 10^2 pf,j^ gy hypothesis, 10^2 pfjj pure 
LG7 overloads the colar.n and sitjnificanc nunber of 
phage eluce before their churacter istic position in the 
KCl gradient. We also find that 10^^ pfu overloads the 
colunn only slightly, and that 1010 pfu does not 
overload the column. Because the use of the affinity 
separation in Sec. 15 will involve a population in 
which no single mcrber is nore than one part in 10**, 
conclude that 10^^ pf,j Qf ^ variegated population could 
be applied to a column of 1.0 ml matrix volume without 
overloading with respect any one species. The 
overloading of a 1.0 ml colunn by 10^^ - pfu also 
indicates that the initial column that captures 
indiscriminately adhesive phage should be 5 to 10 tir.es 
as large as the colur.n that supports the target 
naterial . 

Elution v.Tlumes of LC7 and LGIO obtained frc.-a 
cells grown on media that is 2.0 mM in I PTC are 
measured at optimal DoA>:oM and elution rate and tor a 
loading of 10^° pfu for various initial ionic 
strengths: i.O m>t, 5.0 rJI, 10.0 etuM, 2 0.0 mM, and 50.0 
rJl. We find that LCIO is slightly retarded by t.^.e 
colu.'nn when loaded at 1.0 m.M KCl, but that LG7 always 
comes off the column at its characteristic place in the 
gradient. We use 10.0 rM as initial ionic strength in 
all remaining affinity separations. 

To determine the sensitivity of chromatography of 
phage that display variants of BPTI on their surfaces 
(Sec. 10.1), we prepare artificial mixtures of two 
closely-related phage that differ only at one residue 
in the BPTI domain. One variety of phage has strcn-; 
affinity for the column used in this step, while the 
other phage has no affinity for the column. We 
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chronatograph these nixCurcL; to discover how little of 
the phage that binds to the column can be detected 
within a large najority of phage that do not bind the 
column. 



( 
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15 



20 



For these tests ve choose AHTrp as.AfM(BPTI). A 
column having 2 nl bed volurie is prepared with 
(DoAMoMoptinal °f AltTrn)/Cnl of ' Af f i-Cel 10 t ) . 



The 



column is cai led . : AlITrp ) and h.TS = 1.0 ml. 



A new phage, LG9 , is prepared that displays 
BPTI(V15) as IPBD in contrast to LG7 that displays 
BPTI(K15, wild-type) as IPBL). Residue IS of BPTI is 
residue 38. of the oso- ipbd gone. We introduce the 
change K38 to V by replacement of a short segment of 
the osD"ipbd gene. The two oligo-nts 
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TAC 


AAC 


CCT 


A<-w-v 


CCA 


CC 


3' oUg = 16 


atg 


ttg 


cga 


ttt 


cgt 

1 


cc 
Gtu 


5' oLig=17 
T 1 



are synthesized by standard rr.ethcds and annealed; the 
lower case letters in oliq = 16 and the upper case 

35 letters in olig?17 are mutant with respect to pLC7. 
Plasmid pLC7 DNA is digested with both Apa I and Stu I 
and the large piece purified. The ds oligo-nt is added 
to the purified backbone of pLG7 and ligated; the 
ligated DNA is used to transform competent cells. 

40 After a short grow out, . the cells ar^^ plated on 




ar^picill in-containing plates and Amp** colonies are 
picked. The nutations destroy the unique BssH 11 site, 
thus we can screen colonies through restriction 
digestion. To conCinn the construction, DNA fro.-a 
5 colonies having the correct restriction digestion 
pattern is sequenced frora about 10 bases above the 
Stu I site to about 10 bases below the Aoa I site. The 
correct construction is called pLC9. , 

10 To- expedite differentiation between LG7 and an 

LC9-der ivative phage, we replace the anp ^ gene of LC9 
with the tet ^- gene froa p3R322. Plasnid pBR:22 is cut 
with Bsn I at the unique site at 13 53 and the 
linearized CI.'A is blunted with Klenow fragr.ent and 

15 purified. The blunt d::a is cut with AAt II and the 
142C-base tet^-bearinq fragr.cnt purified by agarose gel 
electrophoresis or HPLC. Plasmid pLC9 ds DNA is cut 
with Xb.-\ I at the unique site and the linearized DNA is 
blunted with Klenow fragr.enc and purified. The linear, 

20 blunt DI.'A is digested with Aa t II and the 7.8 kb 
fragTT.cnt is isolated. The two isolated DN'A fragments 
are nixed, annealed, liqated, and used to transfom 
competent col i cells. The transforr.ed cells ai'e 

selected wit.h tetracycline. The corr*;ct construction 

25 contains r.& \ I, F.coR I, and FcoR V sites, but LC9 
contains none of these. The correct construction, 
having 9.2 kb, is easily distinguished fro.-n pBR322 and 
is called LCI I. DIIA from phage LCll is sequenced in 
the vicinity the. junctions of the newly inserted .tct*^ 

30 gene to confir:: the construction. 



LCI and LGll arc grown with optir.un I PTC (2.0 mM) 
and harvested. Mixtures arc prepared in the ratios 
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where V^^^ ranges fron 10^^ to 10^ by factors of 10. 
Large values of V^^^^ are tested first; once a VjLij^ is 
found that allovs recovery of LC7, smaller values ot 
^i;^ arc not be tested. Once a value of V^^p, is found 
that allo'-s recovery of LC7, we test values that arc 
larger by 2-, or 3-fold so Chat is dec^LTained 

within a factor of 2 . 

The colunn (AHTrp: is first blocked by treatment 
with 10^^ virions of M13oi:;29 in 100 ul of 10 oH KCl 
buffered to pH 3 . 0 with p^-sphate; the column is washed 
with the sa.T.e buffer until OD260 returns to base line 
or 4 X have passed through the column, whichever 

co=es first. One of the mixtures of LC7 and LCll 



containing 10 



12 



pfu in 1 nl of the 



buffer is 



applied to ( AHTrp | . 
way : 



The cclunn is eluted in a standard 



20 



25 



1) 10 KCl buffered to pH 8.0 with phosphate, 
until optical density at 2aOn.-n fails to base line 
or X V,,,, whichever is first, (discard effluent), 

2) a gra:iient of 10 rJI to 2 M KCl in 3 x v-^., p'd 
held at 3.0 with phosphate, (30 x 100 ul 
fractions) , 



30 



3) a gradient of 2 M to 5 M 
phosphate buffer to pit 8,0, 
fr^jctions) , 



KCl in 3 X 
(30 X 100 ul 



4) constant 5 M KCl plus 0 to 0.3 M guanidiniun Cl 
in 2 X Vy, with phosphate buffer to pH 3.0, (20 x 
100 u 1 f ract ions) , 
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5) constant 5 M KCL plus 0.3 M quanidinium Cl ii. 



1.2 X Vv 



100 ul fractions) . 



with phosphate buffer to pH 8.0, (: 
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Samples of 4' ui froo each fraction are placed at 
suitable dilution on pr.age-scns i tive Sup"*" cells (so 
that M13'an<;29 will net qrow). In addition to the 
effluent fractions, a sar.ple is removed from the column 
and used as an inoculur. for phage-sens it ive Sup^ cells. 
Plaques are transferred to amp icillin-containing LB 
agar. Colonies that are ampicill in-rcsistant are 
tested for display of 3PTI(K15) by use of trp* or 
AHTrp*. Teswing boqir.s vith colonies obtained by 
culturing an inoculum !: ron che column, proceeds to the 
last effluent fraction, and works backwards toward 
earlier fractions. Once a positive colony is found, no 
further tests are required for that value of 
nc BPTI positive colonies are detected, the population 
of phage obtained from the column matrix and the last 
fev ( e.g. 5 to 10) phar:c-bea ring fractions are acrged 
and cultured. Phage are harvested from this culture 
and chroma tograched by the above procedure. This 
.process continues until a positive colony is isolated 
or Nchrom P^i^ses of chromo tography and growth have been 
completed. If no positive colonies are detected after 
Nchrom passes of enrich-.ent, Vlira is reduced by a 
suitatie factor and the process is repeated. 



by hypothesis, = ^-0 



10^ is the largest 



value for which LC7 can be recovered. Thus C.^gp^i = 
4.0 X 10^. Three cycles of chromatography ore required 
to isolate LC7, so the first approximation to C^ff is 
740 ( = exp( lcge(4.0 x l0^)/3 ) ). 




We now deterninc the efficiency of the affinity 
separation (Sec. 10.2). This is done by: a) preparing 
mixtures of LC7 and LCll in- the rctio 1:Q, b) enrichincj 
the population for LC7 for one separation cycle, and c) 
dctemininq the fraction of LG7 in the lasf phage- 
bcaring fraction. The phage are obtained from cultures 
induced at 10.0 uM I PTG . the optimal level. Q is 
decreased until roughly half the phage are LPT. We 
start vith Q = 1.5 x lo'* = 20 x apprcxinate C^^ff. The 
mixture is applied to a (AHTrpi column bearing 4.0 rag 
AHTrp on 1.0 nl of Affi-Gel 10 (the optimal DoAMoM) and 
cluted in the specified nanner. A sanple of -i ul from 
each fraction is plated at suitable dilution on phage 
sensitive cells on LB agar. The identity of colonies 
in the last phage-boar ing fraction is determined by 
transferring colonies to amp ici 1 1 in-conta in i ng and 
tetracycline-con.taining plates; colonies that snov Tet^ 
are from L.311 and colonies that shew Ar.?-" are fcon LG7 . 
When Q is 1.5 X lO"^.. j>. of colonies are BPTI positive. 
When Q is 1.5 x 10^, 50^ of the colonies are BPTI 
positive. Thus calculate Cgff - .60 x 1.5 x 10^ = 

900. 

Myoglobin . is strongly colored and it is possible 
,that binding of lU'.Kb to M13 could provide enough 
optical absorption to allow FACS sorting ct M13 that 
bind HH.-^'.b (See Sec. 10.4). 

We have new constructed 1X37 thac displays one or 
more BPTI dor.airs on c.ich virion. The pso- ipbd gene is 
under control of the Ur-/V5 promoter so that expression 
levels of DI-'ri-M13 CP can be manipulated vU [^f^"^^!- 
This construct r.ay be used to develop many different 
binding proteins, all based on BPTI. An optif=urs level 
of induction has been determined. An optimum remount of 
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AfM(PBD) « DoA.^oM(.j,timu.-n = ^.O mg/(nil of support; has 
been determined; carqec ncLeculas will be applied to 
columns at this level in the process disclosed in Sec. 
Ib.l. These optinum levels nay be adequate £or all 
5 targets and all variegations of BKTI displayed on 
derivatives of MIJ based on LC7, but sonc turther 
optimization r.ay be needed if other v.^lues of pH or 
temperatures are used. 

10 Other Dhd gene fragments may be substituted for 

the bpti gene 5ragm<int in pLG7 with a high li>:eLihcod 
that PBO will appear on the surface of the new LG7 
derivative . 



15 

HHMb is chocen as a typical pnt-i.n targci; any 
other protein could be used. HKKb satisfies all of the 
criteria for -i target: 1) it is large enough to be 
20 applied to an affinity matrix, 2) After attach.'::ont it 
i-s not reactive, and 3} after attachment t^ore is 
:;ufficient unaltered surface to allow specific binding 
by PBOs. 

25 . The essential information for Ci-Mb is Xnown: 1) 
HHMb is stable at least up to TO'^C, Letvecn pH and 
9.3, 2) HH.Mb is stable up to 1.6 M Guanidiniun-. Cl, 3) 
the pi of HHMb is 7.0, 4) for HH.'lb, M^. = l'),noo. 5} 
HHMb requires hacm, 6) HH.'^b has no proteolytic 

30 activity. 



In addition, the following information about HHMb 
and other myoglobins is available: I) the sequence of 
HHMb is Xnown, 2) the ':D structure of sperm whale 
35 myoglobin is known; HHMb has 19 amino acid differences 
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and it is gener? 1 ly ' assumed that the 3D structures are 
almost identical, 3) HHMb has no enzynatic activity, 4) 
HHHb is not toxic. 

We set the specifications of an SBD as : 

1) T = 25°C 

2) pH = 8 . 0 



3) Acceptable solutes : 
A ) for binding : 

i) phosphate, as buffer 

ii) KCl, 10 nuM, 

B ) for column clution : 

i) phosphate, as buffer 

ii) KCl, up to 5 M, 

iii) Guanidinium Cl 



0 to 2 0 mH, and 



0 to 3 0 niM, 

and 

up to 0.8 M. 



20 4) Acceptable K^j < 1 • 0 x 10"** M. 

We choose LG7 as CP(IPBD). 

As stated in Sec. 13. 'l, the residues to be varied 
25 are picked, in part, through the use of interactive^ 
computer graphics to visualize the structures. In this 
-section, all residue numbers refer to BPTI . We pick a 
set of residues that forns a surface such that all 
residues can contact one target molecule. Infomation 
30 that we refer to during the process of choosing 
residues to vary includes: 1) the 3D structure of BPTI, 
2) solvent accessibility of each residue as computed by 
the nethod of Lee and Richards (L£EB71), 3) a 
compilation of sequences of other proteins homologous 
35 to BPTI, and 4) knowledge of the structural nature of 
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diffcrenc ar.ino acid types. 

Tables 16 and 34 indicate which residues of B?TI : 
a) have substantial surface exposure, and b) are known 
to tolerate other amino acids in other closely related 
proteins. we use interactive co-puter graphics to pick 
sets of eiqht to twenty residues that are exposed and 
variable and such that all r.er.bers of one set c^^n touch 
a molecule of the target natcridl at one ti.-::e. If BPTI 
has a s.iall anino acid at a qiven residue, that anino 
acid niay not be able to contact the target 
simultaneously with all the other residues in the 
interaction set, but a larger Aaino acid aight well 
n-.ake cont.ict. A charged acino acid night affect 
binding without making direct contact. In such cases, 
the residue should be included in the interaction set, 
with a notation that larger residues night be useful. 
In a siEiilar way, large acino acids near the geometric 
center of the interaction set r.ay prevent residues on 
either side of the large central residue from making 
simultaneous cantact* If a £-.ill anino acid, however, 
were substituted for the lar^G a.oino acid, then the 
surface would becone flatter ar.d residues on either 
side cculd -ake sioultaneous contact. ' Such a residue 
should be included in the interaction set with a 
notation that small a.Tiino acids ray be useful. 

Table 35 vas prepared frcn standard nodol parts 
and shows the naxirttj- span bct-cen C^cza ^^'^ ^'"^^ °^ 
each type of side group. C^cta "^^^ because it is 
rigidly attached to the protein main-chain; rotation 
about the Caip^a-^bcta ^^^^ ir.portant 
degree of freedom for dcternining the location of the 
side group. 
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Table 3-; indicates five surfaces that r-eec the 
given criteria. The first surface coriprises tne set of 
residues that actually contacts trypsin in the coripleic 
of trypsin with BPTI as reported in the Broo)chaven 
Fror.ein Data Bank entry "ITPA". This set is indicated 
by the nur.ber "l". The exposed surface o'l-the residues 
in this set (taken fro3 Table 16) totc. Vs 1148 . 
Although this is not strictly t-e area o: contact 
between BPri and trypsin, it is approx inate ly zr.e same. 

Other surfaces, nu-bercd 2 to 5 . were picked by 
first picking one expoccd, variable residue and then 
picking neighooring residues until .a surface was 
defined. The choice of sets of residues shown in Table 
34 is in no way exhaustive or unique; other sets of 
variable, surface residues can be pic>:ed. Sot U is 
shown in stereo view. Figure 10, including the alpha 
carbons of nFTt, the disulfide linkages, and the side 
groups of the set. We take the orientation of BPTI in 
Figure 10 as a standard orientation, and hereinafter 
refer to K15 as being at the top of tr^.e -olecule. while 
the carbo>cy and amino ter-ini are at the bottcin. 

Solvent accessibilities are useful. easily 
tabulated indicators o: a residue's exposure. Solvent 
accessibilities must be used with sor.e caution: snail 
aaino acids are under-represented and large a.-ino acids 
over-represented. The user nust consider what the 
solvent accessibility cf a different amino acid would 
be when substituted into the structure of BrTC 



To create specific binding between a deriv.-tivo of 
EPTI and HHyb, ve will vary the residues in set ?2.- 
This set includes the twelve principal residues 17 (R), 
35 19(1), 2l(Y). 27(A), 2S(C), 29(L), 31(Q), 32(7), 34(V), 
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48 (A), 49(E), and 52 (M) (Soc. 13.1.1). Uone of the ^ 
p*;;^ residues in set j2 is. completely conserved in the f;.- 

>:;J sample of sequences reported in Table 34; thus we can [ 

vary thi>n with a high probability of retaining the 
5 I'.ndcrlying structure. IntJependent substitution at each 
of these twelve residues of the amino acid types 
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observed at that residue would produce approx mately 
4.4 X 10*^ a:nino acid sequences and the same nur*ber or' 
surfaces . 
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BPTI is a very basic protein. This property has fe-/'-'^ 

been used in isolating and purifying DPTI and its - 

homologucs so th.it the high frequency of argininc and l*-''^'^^ 

lysine residues nay reflect bias in isolation and is 

15 not necessarily required by the structure. Indeed, ^.-rv'*^ 

SCI -I II fro.-:\ Dor.favx r.or i contains seven more acidic P.--" 

• than basic groups (SASA84). p^'^ 

£^^^ 

Residue 17 is highly variable and fully exposed f'fV 
?.0 and can contain R. K, A, Y. II, F, L, H, T, C, Y, P, or 
S. All types of amino acids are seen: large, snail. 

charged, neutral, and hydrophooic. That no acidic ^-.-^ 

croups are observed may be due to bias in the sar'.ple. ^''''''-'l 

25 Residue 19 is also variable and fully exposed, ^ 

containing P, R, I, S, K, Q, and L. r.; 

* r 

Residue 21 is not very vari.:ible, containing F or Y • 

in 31 of 33 cases and I and W in the remaining cases. j--- v*'^ 

The side group of Y2 1 fills the space between T32 and f^^^ 
the main chain- of residues 47 and 48. The OH at the 



( • r - 



tip of the Y side group projects into the solvent. ^ ' ** 

Clearly one can vary the surface by substituting Y or F y y,i 

so that the surface is either, hydrophobic or [' v-r^ 



hydrophilic in that region. It is also possible that 
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• „ fvi'. H) or the other 

the other aromatic onino acid (U^ 
hydrophobics (L. M. or V, night be tolerated. 

Kcsiduc 27 .o.t c.ten contains but S. K L and 
S T are also observed. On structural grounds, this 
residue will probably tolerate any ^y-ophiU= o.ino 
acid and .perhaps any aaino acid. 

Residue 28 is G in DPTI. This residue is in a 
...n t is not in a ccn.on..ation peculiar to glyc.n 

-^-/-r/TTrirsririrgi:: 

Tinding o! H..b on the surface --^/J^ ^ //c ; 

• «;pt Any amino acia, cx-uck^ 

residues cf the principal see. ^ry 

perhaps P. should be tolerated. 

'° ,o highly variable, most often 

Residue 29 i- nigaiy 

■ • . r This fully exposed position will 

containing L. ^nis gvceot, 
probably tolerate alr.ost any an.no ac.d except, 

perhaps, P- 

Residues 31. 32. and 3. are highly variable^ 
' exposed, and in extended ccn.or..ations.- any a.ino ac.d 
Should be tolerated. 

Residues and .9 .re also highly variable and 
fully exposed, any anino acid should be tolerated. 

Residue 52 is in an olpha helix. Any a^ino acid. 
. except perhaps P. night be tolerated. 
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Now we consider possible variation of the 
secondary set (Sec. l'},\,2) of residues that are in the 
neighborhood of the principal set. Neighboring 
residues that might be varied at later stages include 
9CP), 11(T), 15(K), 16(A), 13(1), 20(R), 22(F), 24(N), 
26(K), 35(^), 47(S), 50(D), and 53(R). 




ft 



Residue 9 is highly variable, extended, and 
exposed. Residue 9 and residues and A9 are 

10 separated by a bulge caused by the ascending chain from 
residue 31 to 34. For residue 9 and residues 48 and 49 
to contribute simultaneously to binding, either the 
target must have a groove into which the chain from 31 
to 34 can fit, or all three residues (9, 49, and 49) 

15 must have large amino acids that effectively reduce the 
radius, of curvature, of the BPTI derivative. 

Residue 11 is highly variable, extcndt*d, and 
exposed. Residue 11, liV:e residue 9, is slightly Car 
20 from the surface defined by the principal residues and 
will contribute to binding in the same circumstances. 

Residue 15 is highly varied. The side group of 
residue- 15 points away form the face defined by set ?2. 
25 Changes of charge' at residue 15 could affect binding on 
the surface defined by residue set 52. 

Residue 16 is varied but points away from the 
surface defined by the principal set. Changes in 
30 charge at this residue could affect binding on the face 
defined by set 32. 

Residue IS is I in BPTI. This residue is in an 
extended conf orn'^t ion and is exposed. Five other amino 
acids have been observed at. this residue: M, F, L, V, 
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and T. only T is hydrophilic. The side group points 
directly away froa the surface defined by residue set 
42. substitution of charged amino acids at this 
residue could affect binding at surface defined by 
residue set i2. 

Residue 20 is R in BPTI. This residue is in an 
extended. confoo^ation and is exposed. Four other ar,ino 
acids have been observed at this residue: A, S. L. and 
Q The side group points directly away fro= the 
surface defined by residue set .2. Alteration of the 
charge at this residue could affect binding at surface 
defined by residue set i2. 

Residue 22 is only slightly varied.' being Y, T, or 
H in 30 of 33 cases. Nevertheless, A, K. and S have 
been: observed at this residue. Anino acids such as L. 
H I, or Q could be tried here. Alterations at residue 
22 may affect the :.obility of residue 21; changes in 
Charge at residue ' 22 could affect binding at the 
surface defined by .residue set =2. 

Residue 24 shcvs sor.e variation, but probably can 
not interact uith one r.olecule of the target 
sinultaneously with all the residues in the principal 
set. variation in charge at this residue might have an 
effect on binding at the surface defined by the 
principal set. 

Residue 26 is highly varied and exposed. Changes 
in charge nay affect binding at the surface defined by 
residue set ir2; substitutions nay affect the nobility 
of residue 27 that is in the principal set. 

W l:as been observed. 





I 



22t 



The side yroup of 35 is buried, but subscicution of F 
or W could affect the nobility of residue 34. 

Residue 47 is always T or S in the sequence . sample 
5 used. The Oga^^^.a probably accepts a hydrogen bond fron 
the NH of residue 50 in the alpha helix. Nevertheless, 
there is no ovcrwholoing steric reason to preclude 
other amino acid types at this residue. . In particular, 
other amino acids the side groups of which can accept 
10 hydrogen bonds, viz. N, D, Q, and E, may be acceptable 
here . 

H Residue 50 is often an acidic amino acid, but 

other amino acids are possible. 

15 

Residue 53 is often R, but other amino acids have 
been observed at this residue. Changes of charge may 
tl affect binding to the amino acids in interaction set 



S2. 



Stereo Figure 10 shnus the residues in set 42, 
■-5 plus R39. From Figure 10, one can see that R39 is on 

the opposite side of DPTI fonn the surface defined by 
the residues in set =2. Therefore, variation at 
25 residue 39 at the same time as variation of some 
jresidues in sot i2 is much less li)cely to improve 
binding that occurs along surface 42 than is variation 
of the other residues in set i2. 

30 In addition to the twelve principal residues and 

13 secondary residues, there are two other residues, 
30(C) and 33(F), involved in surface ?2 that we will 
^ probably not vary, at least not until late in the 

procedure. These residues have their side groups 
35 buried inside DPTI and ,irc conserved. Changing these 
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residues does not chdngc the surface nearly so nuch as 
does changing residues in the principal set. These 
buried, conserved residues do, houever, contribute to 
the surface area of surface ^2. The surface of residue 
set i(2 is comparable to the area of the trypsin-bind ing 
surface. Principal residues 17, 19, 21, ' 27, 23, 29. 
31 , 32 , 34 , 48, 49. and 52 have a conbined solvent- 
accessible area of 046.9 a2 . Secondary residues 9, U. 
15. 16. 18, 20, 22, 24 , 26. 35, 47, 50, and 53 have 
colbined surface of 1041.7 a2 . Residues 30 and 33 have 

Thus the three 



exposed surface totaling 33.2 A 
groups' conbined surface is 2026.8 A^. 

Residue 30 is C in BPTI and is conserved in all 
homologous sequences. . It should be noted, however, 
that C14/C3a is conser/ed in all natural sequences, yet 
Marks et aK.. (rtARK37) showed that changing both CI4 and 
C33 to A, A or T.T yields a functional trypsin 
inhibitor. Thu= .t is possible that -BPTI-liV^c 
aolecules vill fold if C30 is replaced. 

Residue 33 is F in BPTI and in all hor.oloqcus 
sequences: Visual inspection of the BPTI structure 
suggests that substitution of Y, M, H, or L. night be 
25 tolerated. 

Having identified tvcnty residues that define a 
possible binding surface, uc rust choose sone to vary 
first. Given our hypothetical affinity separation 
30 sensitivity, C^^^.y,. we decide to vary six residues 
leaving so.-ne margin for errors in the actual base 
composition cf variegated bases. To obtain r.axLnal 
recognition, we choose residues from the principal set 
that arc as far apart as possible. Table 36 shows the 
distances between the beta carbons of residues in the 
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principiJl and peripheral set. R17 and 734 are at one 
end of the principal surface. Residues A27, C28, L29, 
A48, to, and M52 are at the other end, about twenty 
Angstro=s away; of these, we will vary residues 17, 27, 
29, 34, and 48. Residues 28, 49, 
at later rounds. 



and 52 will be. varied 



Of the remaining principal residues, 21 is left to 
later variations. Among residues 19, 31.. and 32, we 
arbitrarily pick 19 to vary. 



Ununited variation of six residues produces 6.4 x 
10^ amino acid sequences. 3y hypothesis, C^ensi ^ 
in 4 X 10^. Table 37 shows the prograrrned variegation 
at the chos&n residues. The parental sequence is 
present as 1 part in 5.5 x 10^, but the least favored 
sequences are present at only 1 part in 4.2 x 10^. 
Anong singie-anino-acid substitutions frcr. the PPBD, 
the least favored is FI7-r 19-A27-L29-V34-A43 and has a 
calculacod abundance of 1 part in 1.5 x 10^. Using the 
cptinal qf/-. codon, we can recover the parental sequence 
and all one-anino-ac id substitutions to the PPBO if 
actual nt ccnpositions come within S\ of prograrr.ed 
compositions. The number of trans fornants is Mj^^v " 
1.0 X 10^ (also by hypothesis), thus we will produce 
fliost of the programmed sequences. 

The residue nunbers of the preceding section are 
referred to nature BPTI ( R1-P2- . . . -AS 8 ) . Table 25 has 
residue nur.bers referring to the pre-Ml 3CP-DPTI 
protein: all mature OPT! sequence nunbers have been 
increased by the length of the signal s'^.quence. 
23. Thus in terms of the prc-0SP-P3D resioue nunbers, 
we wish to vary residues 40, 42, 50, 52, 57, and 71. A 
DMA subsequence containing all these codons is found 
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betveen the (A^a I/Dra II/Psc I) sites at base 19 1 and 
the Sph r site at base 209 of the osp-pbd gone. Among 
Aoa I, Ora I, and Pss I, Aj^ I is preferred because U 
recognizes six bases without any ar;biguity, Ora IT and 
5 Pss I, on the other hand, recognize six bases with two- 
fold ambiguity at two of the bases. The vgDNA will 
contain .-nore Dra II ar.d Pso I recognition sites at the 
varied locations than it will contain A£a J recognition 
sites. The unwanted extraneous cutting of tne vgDKA by 

10 A£a I and S^n I will elininatc a few sequences frora our 
population. This is a ninor problen, but by using the 
more specific enzyne (Ana I), we nininize the unwanted 
effects. The sequence shown in .Table 37 illustrates an 
additional way in which gratuitous restriction sites 

15 can be avoided in sons cases. The osp-l pbd gene had 
the codon CGC for g51; because we are varying both 
residue 50 and 52, it is possible to obtain an Aoa I 
site. If we change the glycine codon to CCT, the Aoa I 
site can no longer arise. Arvi I recognizes the DNA 

20 sequence (CGCCC/C) . 





Each piece of dsD>'/A. to be synthesized needs six to 
eight bases added at either end to allow cutting with 
restriction enzyr.es and is shown in Tabic 37. The 

25 first synthetic base (before cutting with A£a I and S£h 
I) is 184 and the last is 322. There are 142 bases to 
be synthesized. The - center of the piece to the 
synthesized lies between Q5-; and V57. The overlap can 
not includa varied bases, so we choose bases 245 to 255 

30 as the overlap that is 12 bases long. Note that the 
codon for F56 has been changed to TTC to increase the 
GC content of the overlap. The ar.ino acids that are 
being varied are narhed as X with a plus over then. 
Codons 57 and 71 arc synthesinod on the sense (bottom) 

35 strand. The design calls for "qfK** in the antisense 
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Strand, so that the sense strand contains {fron 5' to 
3') a) equal part C and A f i ■ e . the conplorent of k) , 
b) (0.40 0.22 A, 0.22 C, and 0.16 C) ( i.e. the 

conplernent of f ) , and c) (0.26 T, 0.26 A, 0.30 and 
0 . 18 Ci . • 



10 



Each residue 
possible outcomes. 



that is encoded by "qfJc" has 21 
each of the amino acids plus stop. 
Tabic 12 gives the distribution of amino acids encoded 
by "qfk'\ assuming 51 errors. The abundance of the 
parental sequence is the product of the abundances of R 
sc I X A )i L )C V X A, The abundance of the least- 



favored sequence is 1 in 4.2 



10^ 





15 Oliq527 and olig?23 are annealed and extended vith 

Klenou fragment, and all four (nt)TPs. Beth the ds 
synti.etic Of.'A and P.f pLC7 Of.'A arc cat with both Araa I 
and Soh I. The cut Dr:A is purified and the appropriate 
pieces ligated (See Sec. 14.1) and used to transforrn 

20 conpetent Pc333. (Sec. 1*;.2). In order to generate a 
sufficient nunber of t rans f oman ts , is set to 5000 

nl. 



1) culture co I 1 in 5-0 
25 ' until cell density reaches 5 x 10 
cells/.-:!!. 



1 of L3 broth at :7*^C 
^ to 7 X 10^' 



3 0 



2) chill on ice for 65 ninutcs, centrifuge the 
cell suspension at 4000g for 5 ninutes at 4*^C, 

3) discard supernatant; resuspend the cells in 
1667 ml of an ice-cold, sterile solution of 60 
inM CaCl2. 



35 



4) chill on ic-: for 15 ninutes, and then 
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cencrifu^G at 40C0g for 5 nlnates at A^C, 

5) discard supernatant; resuspend cells in 2 x 
400 nl of ice-cold, sterile GO mM CaCij: store 
cells at 4°C for 24 hours, 

6) add DNA in ligation or TE buffer; nix ^and 
store on ice for 30 minutes; 20 ml of solution 
containing 5 ug/.-nl of DMA is used, 

7) heat shock cells at 42^0 for 90 seconds, 

3) add 200 ml L3 irotih and incubate at 37°C for 
1 hour, 

9) add the culture to 2.0 1 of LB broth 
containing anpicillin at 35-100 ug/ir.l and 
culture for 2 hours at 37^c. 

10) centrifuge at SrOO g for 20 ninutes at A^C, 

11) discard supernatant, resuspend cells in 50 
ml of LB broth ui-^ ampicillin and incubate I 
hour at 37OC. 

12) plate colls cn LTy agar containing 
anpicillin. 

13) harvest virions by r.ethod of Saiivar e£ aU 
(SALI6-;). 



The heat shock of step (7) can be done by dividing the 
200 al into 100 200 ul aliquots in 1.5 ml plastic 
35 Eppendorf tubes. It is possible to cptinir.e the heat 
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sliock for other voluncs and kir»ds cf ccnCainer. It is 
important to: a) use all or nearly all the vgD;/A 
synthesized in ligation, this will require large 
amounts of pLG7 b^-^ckbonc, b) uce all or nearly all the 
ligation mixturo to transform cells, and c) culture all 
or nearly all the trans fornants at high density. These 
Pleasures arc directed at riaintaining diversity. 

IPTC is added to the growth ncdium at 2.0 r:,^ (the 
optimal leveJ ) and virions are harvested in the usual 
way (Sec. 14.3). It is ixportant to collect virions in 
a way that samples all or nearly all the trans fornants . 
Because r~ cells arc used in the transformation, 
r.ultiple infections do not pose a problem. 

HHMb has a pi of 7.0 and we carry out 
chromatography at pH 3,0 so that HHMb is slightly 
negative while BPTI and most of its mutants are 
positive. HHMb is fixed (i^ec. 15.1) to a 2.0 ml column 
on Affi-Cel loC^^^) * or Affi-Cel isf^'"*' at 4.0 mg/mi 
support matrix, the same density that is optimal for a 
column supporting trp. 

We note that charge repulsion between B^TI and 
HiiMb should not be a serious problem and does not 
impose any constraints on icns or solutes allowed as 
eluants. Neither DPTT nor HHMb have special 
requirements that constrain choice of eluants. The 
eluant of choice is KCl in varying concentrations. 

To remove variants of BPTI with strong, 
indiscriminate binding for any protein or f'^r the 
support matrix (Sec. 15.2), wc pass the variegated 
population of virions over a column that supports 
bovine serum albumin (OSA) before loading the 
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population onco the I KHI'.b ) • colunn . Atfi-Cei lo£'^^) or 
Affi-OeJ. ist^^^-^l is used to ianobiliic SSA at the. 
highest level the matrix will support. A 10.0 ml 
column is loaded '-rith 5.0 d1 of Af f i-Cel-1 inked-BSA; 
5 this column, called (BSA), has = 5.0 ml. The 

variegaTied population of virions containing lO^^ 
1 ml (0.2 X \V) of 10 r.M KCi , 1 phosphate. pH 8.C 

buffer is applied to (BSAl. We wash (3SA| vich 4.5 nl 
(0.9 X Vv) of 50 n-M KCl. 1 tiu'l phosphate, pH 0.0 buffer. 
10 The uftsh vith 50 mM salt will elute virions that adhere 
slightly to BSA but not virions with strong binding. 
The pooled effluent of the (BSA) column is 5.5 ml of 
approxinately 13 ruM KCl. 

15 The colu.nn (HHMb) is first blocked by treat-ent 

with 10^- virions of M13{ar.429) in 100 ul of 10 m.M KCl 
buffered to pH 3.0 with phosphate; the column is washed 
with the sar-e buffer until 00.60 returns to base line 
or 2 X have passed through the column, ■ whichever 

2C co.-nes first. The pooled effluent from (BSA) is added 
to (Hi:Mb) in 5.5 mi of 13 mJl KCl, 1 m- phosphate, pH 
3.0 buffer. The column is eluted (Sac- 15.3) in the 
following way: 

25 1) 10 r-M KCl buffered to pH 8 . 0 with phosphate, 

until optical density -at 2S0nm falls to base lino 
or 2 x Vv, whichever is first, Ccfflucnt 
discarded), 

30 2) a gradient of 10 r2i to 2 M KCl in 3 x V^. pH 

held- ac 3.0 with phosphate. (30 x 100 ul 
f ract ions) , 



35 



3) a gradient cf 2 M to 5 M KCl in 3 x Vy, 
phosphato buffer to pH 3.0 (30 x ICO ul 
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4) constant 5 M KCl plus 0 to 0.3 M quanidiniua Cl 
in 2 X Vv. vith phosphate buffer to pH 8.0, (20 x 
100 ul fractions) , and 

5) constant 5 K KCi plus 0.8 M quanidinium Cl in 1 
X \\f, vith phospnate buffer to pH CO, 
ul fractions) . 



• ( 10 X 100 



In 



addition to the ebition fractions, a sample i. 
removed froo the colunn and used as an inoculun for 
phage-sensitive Sup" cel\s (Sec. 15.4). A sar.ple of 4 
ul froa each fraction is plated on phage-sens i tive Sup 

cells. Fractions that yield too nany colonies to 
count are replated at lower dilution. An approxi.-nate 
nitre of each fraction is calculated. Starting with 
the last fraction and wording toward the first fraction 
that was titered, we pool fractions until approximately 
109 phage, are in the pool, i^e_^ about 1 part in l^OJ of 
the phage apolied to the column. This population is 
infected into 3 x 10^^ phage-sens it ive ^£334 in 300 nl 
of L3 brcth. The .ery lo-. multiplicity of infection 
(coi) is chosen to reduce the possibility of multiple 
infection. After thirty minutes, viaole phage have 
entered recipient cells but h..ve not yet begun to 
produce new phage. Phage-born genes are expressed at 
this phase, and we can add ar.picillin that will kill 
uninfected cells. These cells still carry F-pili and 
...ill absorb phage helping to prevent multiple 
infections. 

If cultiple infection should pose a problem that 
cannot be solved by grovth at low mul t iplc-of-inf ect ion 
on cells, the following procedure can be employed to 




obviaCe the problem. Virions obtair.e'l froa the 

affinity separation are infected into F"" ca\i and 
cultured to anplify the genetic nessages (5ec. 15.5). 
CCC DtU is obtained either by harvesting RF DtiA or by 
5 In vitro extension of pri.T.crs annealed to ss phage DNA. 
The CCC DNA is used to transform F" cells at a high 
ratio of colls to UNA. Individual virions obtained in 
this way should bear only proteins encoded by the Ot.'A 
within. 

10 

The variegation produced as nany as 6.4 x 10^ 
different amino-acid sequences. ^eff Thus, 
after two separation cycles, the probability of 
isolating a single SBD is less than 0.10; after three 
15 cycles, the probability rises above 0.10. 

The phageraid population is grown and 
chromatographed three ti.-es and then exarr.ined for 3BOs 
(Sec. 15.7). In each separation cycle, phage frcn the 

20 last three fractions that contain viable phage are 
pooled with ■ phage obtained by renoving so.-ne of the 
support matrix as an inoculun. At each cycle, about 
10^2 phage are loaded onto the colunn and about 10^ 
phage are cultured for the next separation cycle. 

25 After the third separation cycle, 32 colonies are 
Vicl:ed frcm the last fraction that contained viable 
phage; phage fron these colonies are denoted SBDl, 
SBD2 and SBD32. 

• 30 Each of the SBDs is cultured and tested for 

retention on a Pep-Tie column support ir.q HH^.b (Sec. 
15.8). Phage LC7(SUD11) shews the greatest retention 
on the Pep-Tie i HMKb | colu.T.n, eluting at 367 .t-M KCl 
while wtM13 elutfts at 20 r-M KCl. S3011 becor.es the 
35 parental anino-acid scquer.ce to the second variegation 




0 
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r 



cycle . 



The result of this hypothetical experiment is 

shown in Table 38. R-lO changed to 0, 142 changed to 

5 Q, A50 changed to E, L52 remained L, and A71 changed to 
W. * . 

The next round of variegation (Sec. 16) is 
illustrated in Table 39. The residues to be varied are • 

10 chosen by: a) choosing so.T.e of the residues in the 
principal set that were not varied in the first round 
(v_iz_5_ residues 42, 44 , 51, 54 , 55, 72 , or 75 of the 
fusion) , and b) choosing some residues in the secondary 
set. Residues 51, 54 , 55, and 72 are .varied through 

15 all twenty amino acids and, unavoidably, stop. Residue 
44 is ■ only varied between. Y and F. Some residues in 
the secondary set are varied thnough a restricted 
ranee; primarily to allow different charges (-»-, 0, -) 
to appear. Residue 38 is varied through K, R, E, or G. 

20 Residue 41 is varied through I, V, K, or E. Residue 43 
is varied through R, S, C, W , K, D, E, T, or A. 

Olig^29 and olig230 are synthesized, annealed, 
extended and cloned into pLo7 at the Apa I/ Soh I sites. 
25 The ligation mixture is used to transform 5 1 of 
competent PE383 cells so that lof trans formants are 
obtained. A new ( HH.Mb I is constructed using the same 
support matrix as was used in round 1- A sample of 
10^*^ of the harvested LC7 are applied to { HHMb ) and 
30 affinity separated. The last 10^ phage off the column 
and an inoculum are pooled and >. .turcd. The cultured 
phagcmids are re-chroma tographed for three separation 
cycles. Thirty-two clonal isolates (denoted SBOll-1, 
SDDll-2,..., SBOll-32) are obtained from the effluent 
of the third separation cycle --nd tested for oinding on 





15 



30 
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a Pcp-rio (HHMb; colu.-n. OC this set, 5BDl].-2j shovs 

the greatest retenticn on the Pop-Tie { HHMb 1 coiu::n, 
e luting at 69 2 rJ-l KCl. 

5 The results of this hypothetical selection is 

shovn in Table ^O. Residue 33 (K15 of BPTI) changed to 
E, 4 1 becones V, 42 goes to N, 44 goes to F, 51 goes to 
r, 54 qoos to S, 55 goes to A, and 72 gees to Q . 

10 The sbd 11-23 portion of the osp-obd gene is cloned 

into and expression vector and BPTI(£15, 017, V18, QI9, 
N20, r2:, E27, F28, L29, S31,. A32, S34, W71, Q72) is 
expressed in the periplas:n. This protein is isolated 
bv standard methods and its binding to HiiMb is tesred. 



is found to be 4 . 5 X 10 



-7 



H. 



A third round of variation, using SBOli-23 as 
PP3D, is i 1 i'lStriTted in Tabic 4 1; eight anino acids are 
varied. Those in the principal set. r'^sidues 40, 55, 
and 57, arc varied through all tventy anino acids. 
Residue 22 is varied through P, 0, K, A, or E. 

Residue 3 4 .is varied through T, P. Q. K. A, or E. 
Residue 44 is varied through ?, L, Y, C, or stop. 
Residue 50 is varied through E, K, or Q . Residue 52 is 
varied through L, F, I, M, or V. 

The result oc this variation is shovn in Table 42. 
The selected SBD is donotcd 38^11-23-5 and elutes fron 
a Pap-Tie (HHMb) column at 98 0 r.M KCl. The sbdll-:::-5 
segment is cloned into an expression vector and 
BPTr{E5, QU, E15, Al", Via, Q19. ::20, W21, Q27, F23, 
M29, S31, L32. 104, v;7 1 , Q72) is produced. This ti-e 
the K.J is 7.3 X 10" 



This exar.plc is hypothetical. It is anticipated 
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that more variegation cycles -ill be needed to achieve 
dissociaticn constants of 10"^ M. It is also possible 
that Kore than three separation cycles will be needed 
in some variegation cycles. Real DNA chetoistry and DMA 
synthesizers tnay have larger errors than our 
hypothetical SX . It S^rr ^ 0.G5, then ve nay not be -^i'L- 
abie to vary six residues at once. Variation* of 5 ^■'-'^■i' 
residues at cnce is cortainly possible. 



m 




LO 



15 



Tabic I: S ingie-lcttcr codes. 



a = AL-A c = CVS d - ASP e = GLO f - PHE 

g = GLY h = HIS L = ILE k = LYS 1 = LEU 

= MET n - ASt( p = PUO q = GUI r = Ai?C 

S = SER t - ZHR V = VAL w = TRP y = TYR 

- STOP * = any azir.c acid 

b = n or d 
2 = e or q 
X = any amino acid 



20 



?^inale-lf >rtor V.:r\ c^dos for D^fA : 
T, C, A, G stand for thcnselves 



25 



30 



.15 



M .for A or C 

R for puRines A or G 

W for A or T 

S for C or G 

Y for pYri.-nid incs T cr C 
K for G or T 

V for A, C, or G (nol T) 
H for A, C, or T (not G) 
D for A, C, or T (noc C) 
B for C, G, or T {no:: A) 

M for any base. 






■ i -v. . . A .r.-cTTt—r 



r 








.--->tP^ 
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Preferred Outer-Surface Proteins 







Preferred 








Jenetic 


Outer-Surface 






5 


Pflckaae 


Pro te i n 








M13 


coat protein 


a) 


exposed anino territinus. 






(gpviii) 




predictable post- 










trans t a p i nnA l 












10 






c) 


numerous copies in 










V i r ion . 






an Tir 


a) 


fusion data <T/ailable 




Ph i X 1 7 4 




a) 


known to be on virion 


15 








exterior. 








b) 


snail enough that 










the G-inbd gene can 










roolace H aene. 


20 




LamB 


a) 


fusion data available. 








bl 


non-essent ia 1 . 




B. subtilis 


Cote 


a) 


no post-translational 




spores 






processing. 


25 






b) 


distinctive sdequcnce 










that causes protein to 










localize in spore coat. 








c) 


non-osr.ent ia 1 . 






CotD 


Sano OS for CotC. 




r 



■ ^ s _ . 




10 



15 



20 



25 
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Table 3: Ambiguous DtfA for AA_seq2 



23 
1 

A.T.C 



a 
9 

C.C.n 



17 
C.T.n 



P 

25 
C.C.n 



2 

A. A. r 



s 

10 
T.C.n 
A.G.y 



P 

18 
C.C.n 



I ^ 
26 

c.A.y 





Jc 


s 




1 




V 




1 




3 






5 




6 




7 


A 


A.r 


T.C.n 


T. 


T.r 


G 


T.n 


T 


T. r 






A.G.y 


C. 


T.n 






C 


T.n 




V 


a 




V 




a 




t 




11 


12 




13 




14 




15 


C 

1 


T. n 


C.C.n 


c. 


T.n 


G 


C.n 


A 


C. n 




Q 


1 




s 




f 




a 




19 


2C 




21 




22 




23 


A 


.T.G 


T.T. r 


T. 


C.n 


T 


T.y 


G 


C.n 






C.T.n 


A. 


G.y 












f 


c 




1 




e 




P 




27 


28 




29 




30 




31 


T 


.T.y 


T.G.y 


T 


T.r 


G 


.A.r 


C 


.C. n 








C 


T.n 











8 

A. A. r 



1 

16 
T.T. r 
C.T.n 



r I 

24 
C.C.n 
A.G. r 

? 1 
32 I 

C.C.n I 




30 



y 

33 
T.A.y 



t I g 

34 I 35 
A.C.n|G.G.n 



P 
36 
C.C.n 



c I k 
37 38 
T.G.yi A.A.r 



a r 
39 I 40 I 
CCniCCn! 
lA.G.ri 



*.vf<*-;,ivrr 





35 



40 



45 



L 

4 1 
.T.h 



lA.A.r 



i 

. 42 
A.T.h 



a 

50 
G.C.n 



r 

43 
C.C.n 



y 

4 4 
T.AO 



g I 1 



51 
G.G.n 



52 
T.T.r 
C.T.n 



f 

45 
T.T.y 



c 
53 
T.G.y 



y 

46 
T.A.y 



n 

47 
A . A . y 



a I 
43 I 
G.C.ni 



q I t f 
54 1 55 56 
C.A.rj A.C.njT.T.yi 



I •/ I y ! g g c 

57 53 I 59 ! 60 61 

|c.T.n(T.A.y|c.C.n|G.G.nlT.C.y 



r 

62 
C.C.n 
A.C.r 



a 

63 
G.C.n 



'i. ! 

A. A. r I 




\ 





20 



25 



30 



0 



2 c 

Tabic 3, concinUGd. 







r 




n . 




n 




f 




k 


s 




a 


e 


5 




65 




66 




67 




68 




69 


70 




71 


72 




C 


.C.n 


A 


.A.y 


A 


.A.y 


T 


.T.y 


A 


.A. r 


T.C. n 


G 


.C.n 


C.A. c 




A 


.G.r 


















A. G.y 








LO 




d 




c 




m 




r 










q 








73 




74 




75 




76 






?s 




79 


GO 




C 


.A.y 


T- 


G.y 


A 


T.G 


C 


G.n 


A 


C.n 


T.G.ytG 


C . n 


G . 0 . n 


5 




a 




a 




e 




g 




d 


d 




p i 


a 






81 




82 




83 




34 




85 


86 




37 


88 




G 


C.n 


G. 


C.n 


C. 


A. r 


G. 


G.n 


G. 


A.y 


C.A.y 


C. 


C . n 1 G . C . n 



k 
89 
A . A. r 



a 

97 
G.C.n 



a 

90 
G.C.n 



s 
. 98 
T.C.n 
A.C.y 



a f r; 

91 I 92 I * 93 
G . C . n I T . T . y t . A . y 



s 

94 

T.C.n 
I A. G.y 



1 

95 
T.T. r 
C.T.n 



96 
C. A. r 



a 

99 
G.C.n 



t 
100 
A. C.n 



G 
101 
G. A. r 



y ! i 

102 t 103 I 104 
T.A.yj A.T.hiG.Cnj 



35 


y 

105 
T.A.y 


a 
1C6 
G.C.n 


w 
107 
T.G.G 


a 
108 
G.C.n 


n 
109 
A.T.C 


1 V 
1 110 
G.T.n 


V 

111 
G.T.n 


V 

112 
G.T.n 


40 


i 

IJ 3 
A.T.h 


V 

114 
G.T.n 


<J 
115 
G.C.n 


i - 

116 
G.C.n 


1 t 
117 
A.C. n 


1 i 1 9 
118 119 
A . T . h 1 G . G . n 


i I 
12C 

A.T.h] 


45 


k 
121 
A. A. r 


122 
T.T. r 
C.T.n 


t 

123 
T.T.y 


k 
124 
A.A.r 


k 
125 
A.A.r 


t 

126 
T.T.y 


t 
127 
A.C. n 


s 1 
123 
T . C . n 1 
A . G . y i 


" 1 


k 
129 
A.A.r 


a 
130 
G.C.n 


s 
131 
T.C.n 
A.C.y 


132 
T.A. r 
T.G. A 


133 
T.A.r 
T . C . A 


134 
T.A.r 
T.C. A 







m 

m 

m 



m 
m 

m 
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Table 4: Table of Restriction Enzymes 

Table of restriction enzymes with lUB codes. 
Suppliers : 

S=Siq:na Chenical Co. 
P.O.Box 14 508 
Gt. Louis, Mo. 53178 

B-3ethcsda Research Laboratories 
P.O.Box 6009 

Caithersbur^g , .Maryland, 20377 

K=Boehringer Mannheim Biochenicals 
79 4 1 Castleway Drive 
Indianapolis, Indiana, 46250 

1 = Interna tional Biochenicals, Inc. 
P.O.Box 9558 

.♦:cw Kaven, Conr.ecticutt , 06535 

ri=New England BioLabs 
32 Tozer.r^oad 

Beverly, Massachusetts, 01915 

P = Pror.Gqa 

2800 S. Fish Hatchery Road 
.'Madison, Wisconsin, 5371 1 

T=Strat3gene Cloning Syster.s 
11099 North Torrey Pines Road 
La Jolla, California, 92037 



+ before enzyr.c nar.e means that overhang can not be 

self-conplementary . 
X before enzyme nar.e means th.^t overhang may or may 

not be sel f-cor.plcr.cntary . 
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Table 4, concinued. 



Enzyxie 
Aat II 
tAcc I 



Reco»5nit . 

CACGTC 

CTMKAC 



Acc III TCCCCA 
Acv I GRCCYC 
Afl II CTTAAC 
t Afl IIIACRVCT 
Aha III TTTAAA 

+.V1WN I CACNKNCTC 



Aoa I 
ApaL I 
Ase I 
Asp7 18 
Asu II 
tAva I 



GCGCCC 
GTGCAC 
ATT AAT 
GGTACC 
TTCCAA 
CYCCRG 



Ava III ATGCAT 



Avr II 
Dal I ■ 
BanH I 
I 

Bbe I 
;Bbv I 



CCTACG 
TGGCCA 
GGATCC 
GCYRCC 
CGCGCC 
GCAGC 



Symn 
. P 
P 
P 
P 
P 
P 
P 

P 
R 
P 
P 
P 
P 
P 



P 
P 
P 
P 
P 
nP 
nP 
P 
P 
P 
nP 
r.P 
? 
nP 



cuts 
5, I 



2, 
1, 
2, 
I, 
I. 
3 , 



1 , 

2 , 



1, 
3, 
1/ 
1, 
5, 
13 , 



t pbv II GAAGAC 
Be l I TGATCA 
+ Bql I GCCN'NNNNGGC 
Bql II AGATCT 
+Bin I GCATC 
+ I GAATGCN 

RspH I TCATCA 
tBspM I ACCTCC 

BssH II CCGCGC ' P 

+Bst£ IIGCTNACC P 

i Bstx I cca::nnnk»tcc p 

Cf r I YGCCCR P 

Cla I ATCCAT P 

*Q£a II RCGNCCY ? 

+ 0513 IIICACSNNCTC P 

EC04 7 IIAGCGCT P 

^■ EcoN I CCTNNNNNAGG P 

EcoR r GAATTC P 

kcoR V CATATC P 

+ ESP I CCTN'ACC P 

I Fok I GCATC nP 

G.n II YGCCCC nP 

Mae I WCCCCW P 

tiae II RCCCCY P 



7 , 5 
1. 5 
10, 14 
I, 5 
1 , <^ 

s, 
1 , 

2< 



Supply 
<S. M, I,N,T 
<B,«,I,N,P,T 
<T 

< Aha II :N ■. 
<U 

<ncr.e 

<3, T &Dra r :M, I , 
P 

< M , I , N . ? . T 

< t; , T 

<N 

<none 

<P. Sf BstB 1) 
<S? , 3, M, I.N, P,T; 

Apu I:T 
<T; Nsi I:M,M.P, 

£coT22. I:T 

<n 

<S.B.I.N,T 

<S, B. M, I,N, P,T 

<:\, i,;.',T 
< 

<I,N,T 
< 

<S, 3,M,l/:^T 
<S, 3 
<S . c 
< Alvr :N 

<:;, T 
<i: 
<:i 

<;; , T 
<5, 



.N , F , T 



, M , N , T 



5 <: 



<S , 3, M. f^T; 

i JtATS 111:1 

2, 5 <::.r ; E cocno9 

6, 3 <.V.>',T 

3, 3 <nor.c 

5, 6 <:;(soon) 

5 <S. 3. M. I , N, P,T 

3 < S , 3 . M . I , M , P . T 

5 <T 

<:i .N'.T 
< 
< 

<S. B.M. I, N,T 



1 , 
3, 
2, 

14 , le 

1. 5 
3, 3 
5, 1 



PI 



1 



dm 



m 



m 
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Table 


A 

** f 




sn t iriued . 


+ Hqa r 


CACCC 


nP 


10 




<N 


\Hg_iA I 


CWGCWC 


P 


t 




<N 


\}\QiC I 


CCYRCC 


P 


Q 


g 


< 


\HniJ IIGRGCYC 


P 


5 


\ 


^Ban II:S,M,I»S,T 


Hind II 


GTYRAC 


P 


3 


3 


<H; &H inc II:S,B,I 










, N , P, T 


Hind rilAACCTT 


p 


1 / 


e 
J 


^0 R M T MPT 


He^ I 


GTTAAC 


P 


3 , 


3 


<SBI1INP T 


»Hph I 


GGTGA 


nP 


1 3 , 


1 2 


<N , T 




CGTACC 


P 


S , 


1 


<S,B,M,I,N,P,T 












♦Kbo II 


CAACA 


nP 




12 


<S BIN 


Klu r 


ACCCGT 


P 


1, 


5 


<m!n, P,T 


Mst I 


TGCGCA 


P 


3 , 


3 


<T; Fsfi I:S,N 


Nao I 


CCCGGC 


P 


3 


3 


<M N T 


Nan I 


GCCGCC 


P 


2 




<B, N , T 


Nco I 


CCATCG 


P 




5 


<B M N P T 


Nd^ I 


CATATC 


P 


2 


^ 


< B N T 


t'hQ I 


CCTACC 


P 




5 


<:M , N , ? , T 


KOC I 


GCGCCCGC 


P 


2 ^ 


5 


<M, P,T 


Mru I 


TCGCCA 


P 


3! 


3 


<B,M,N,T 




RCATGY 


P 


5. 


\ 




N5D3 II 


CMC C KG 


P 


3, 


3 


< 


+ PIiM I 


CCANrfWNNTGG 


P 


7 , 


4 






GACTC'IINNN 


nP 


9, 


10 




S>r:.^C r 


CACGTG 


" P 


3 , 


3 


<none 




RGGWCCY 


P 


2 , 


5 


<II 




RGCtrCCY 


P 


5, 


2 


<! 




CTGCAG 


P 


5, 


1 


<S, B,M, I ,N, P,T 


Pvu I 


CGATCG 


P 


4 , 


2 


<S, D,N, B(Xor II) ,M 












, P,T 


£vu II 


CAGCTC 


p 


3 , 


3 


<S, B,«, I,N. P,T 


+Esr II 


CCGWCCG 


P 


2 , 


5 


<N,T 


Sac I 


GAGCTC 


P 


5, 


1 


<D(Ssr I) ,M,I,N,P, 
T 


Sac II 


CCGCGG 


P 


4 , 


2 


<B(Sst 11} .I,N,P,T 


Sal I 


GTCCAC 


P 


1, 


5 


<B,M,I,U,P,T 


+Sau I 


cct:iagg 


P 


2 , 


5 


<M; Cvn 1:8; Mst II 










:T: nsu3 6 I:N; Ac 


Sea I 


ACTACT 


P 


3 , 


3 


<M, N, P,T 


\SfaN I 


GCATC 


nP 


10, 


14 


<N 


^sn I 


CCCCNritUINGGCC P S, 


5 


<N,P,T 


Sr.a I 


CCCCGG 


P 


3, 


3 


<B,M, I,N, P.T 


SnaD I 


TACGTA 


P 


3* 


3 


<M, N,T 


Spe I 


ACTAGT 


p 


1, 


5 


<M,N,T 


Sph I 


GCATGC 


P 


5, 


1 


<B,M, I.N, F,T 


Ssp I 


AATATT 


P 


3 , 


3 


<M,N,T 


Stu I 


AGGCCT 


P 


3, 


3 


<tUJM(Aat I) ,P,T 


%StV I 


CCVT.JGG 


P 


1, 


5 





1 



mm 



II 




2-; 4 



Tabic 4, continued. 



^ Tao II CACCCA 
% TaQ II 'CACCCA 
■t- Tthll I I CACNNIiCTC 
% Tthlll II CAARCA 



Xba 
Xca 
Xho 

Xho 

Xna 



I 
I 
I 

II 
I 

III 



TCTAGA 
GTATAC 
CTCCAC 

RGATCY 
CCCGCG 
CGCCCC 



nP 
nP 
P 

nP ■ 
P 
P 
P 

P 
P 
P 



17,15 
< , 5 

16, 14 
1, 5 
3, 3 
1, 5 



15 Xr:in I 



GAANNNN'TTC 



5, 5 



<none 
<none 
<I ,N,T 
<none 

<B,M, I,K, P,T 
<U (soon) 

<B,M, I, P,T; Ccr I: 

T ; PaeR7 I : U 
<M.T ;Nf BstY I) 
<I,N. P,T 
<B; Eajg 

Eco52 I:T 
OI.Mf ASP700 ) ,T 



20 



25 



N_restrct 

Notes: 
Syian : 
cut^: 



100 



P for palindror.ic. nP for non-pal indronic 

first nunber indicates position of cut in 
top strand, 1 r.cans after first base of 
recognition; second nunber indicates 
position of cut in lower strand, counting 
ief t-to-right . 
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Table 5: Potential sites in i£bd gene. 



I 



6 
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15 
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25 



30 



35 



40 



45 



Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 

Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 

Enz 

Enz 
Enz 
Enz 
* Enz 



Sumcnary of cuts. 

% Acc I has 3 elective sites : 
AC! ir has I elective sites : 
Ana I has 2 elective sites : 
Asu i: has 1 elective sites 
Ava III hcs 1 elective sites 
11 has 1 elective sites 



EspM 

BsrH II has 2 elective sites : 



tBstX T has 
+0X3 II has 



L elective sites 
-ruta 3 elective sites 

+gcoM I has 2 elective sites 
+ESD I has 2 elective sites 
Hind r:i has 6 elective sites 



96 169 281 
19 

102- 103 

381 
: 314 • 
: 72 

67 115 
323 

102 103 226 
62 94 
57 1S7 
9 23 60 



Kpn 
Mlu 
Nar 
Kco 
Nhc 



has 
has 
has 



elective sites 
elective sites 
elective sites 



has 1 elective sites 



+ pnH I has 



See 
Sch 
Stu 

= tst 

= Xba I has 



287 361 386 
4S 
314 

238 343 
323 • 
25 289 388 
38 65 
: 94 



has 3 elective sites : 
has 2 elective sites : 
has 1 elective sites 
PnaC I has 1 elective sites : -223 

I has 2 elective sites : 102 226 
+RTr II hJS 1 elective sites : 1C2 
+sTi I his 2 elective sites : 24 261 

I has 3 elective sites : 12- 45 379 
SDh I has 1 elective sites : 221 

I has 5 elective sites : 23 70 1^0 

237 3S6 

I has 6 elec::iva sites : 11 44 

143 263 323 3S3 

1 elective sites : 84 

2 elective sites : 96 169 
1 elective sites : 85 
iS 3 elective sites : 70 2C9 
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Xca 
Xho 



I has 
I has 



= Xna III he 



Enzymes not cutting ipt:.1. 




BarH I 
EcoR V 
Sal I 



Scl I 
Hna I 
S.^u I 



15 



20 



25 



30 



35 



40 



2-^6 



Table 6: Exposure of amino acid types in T4 Izra & HEWL. 

HEADER HYDROLASE (0-GLYCOSYL) 18-AUG-86 2LZH 
COMPND LYSOZYME (E.C.3.2.1.17) 
AUTHOR L. H. WEAVER, B.W. MATTHEWS 

Coordinates from Brookhavcn Protein Data Bank: ILVH. 
Only Molecule A was considered. 

HEADER HYDROLASE (0-GLYCCS YD 29-JUL-82 ' ILYM 

COMHND LYSOZYME (E.C.3.2.1.17) 

AUTHOR J.HOCLE,S.T.RAO,M.SUND.\RALINCAM . 



Solvent radius = 



1.40 



Atonic radii in Table 7. 



Surface area measured in Angstroms^ 



Type 



s igma 



exposed{ fraction) 



ALA 27 

CYS 10 

ASP 17 

GLU 10 

PHE 8 

CLY 2 3 

MIS 2 

ILE 16 



\0 



LVS 

LEU 2 4 

MET 7 

ASM 26 

PRO 5 

CLN 3 

ARC 24 

SER 16 

THR 18 

VAL 15 

TRP 9 

.TYR 9 



211.0 
239 . 8 
271. 1 
297 .2 
316 
185 
297 
273 
309 
282 
293 
273 
239 
299 
344 
223 
250 
254 
359 



6 
5 
7 

1 
2 

6 
0 
0 
9 
5 
7 
6 
3 
3 
4 

335.8 



1.47 
3.56 
5.26 
5.78 
5.92 
i;31 
3.23 
3.6i 
5.38 
6.75 
5.70 
5.75 
2.75 
4 ,75 
8.66 
3 . 59 



80 
05 
38 
97 



2 14 
245 
2£1 
304 
325 
183 
301 
285 
32 1 
304 
299 
235 
242 
305 . 8 
355. & 
236 
257 
261 
3C6 
342 



207. 
234 . 
262 . 
285. 
307 . 
183 . 
294 . 
269. 
UCO . 
269 . 
283. 
262, 
234 . 
291. 
326. 
223 . 
244 . 
245. 
355. 
325. 



85. 1 ( 
38. 3( 
127 . i( 
100.7 ( 
99. 8( 
91.9 f 
32. 9( 
57. 5C 



•0{ 



147 
109 

88. 2( 
143 .4 ( 
12S.7( 
145. 9( 
240. 7( 

5 3 . 2 ( 
139. 9( 
111. 1( 
102. 0( 

72. 6( 



-0) 
16) 

34) 
32) 
50) 
1 1) 
21) 
43) 
39) 
30) 
53) 

5-;) 

70) 
•i3) 
56) 
44) 
23) 
12) 



i 
i 

I 

r. 



m 



mm 



m 

H 
m 

m 

I 





a 



Q 



10 
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Table 7: Atomic radii 
Angstroms 



Calpha 1-70 
Ocarbonyl 1-^2 

Other ator.s 1.80 



t: ■ - 

It- 



Table 8 

Fraction of Df/A molecules having 
n non-parental bases when 
reagents that have fraction 
M of parental nt. 



m 



■20 



H 


.9965 


, 97716 


.9?612 


.3577 


.79433 


.63096 


fo 


.3000 


. 5000 


. 1000 


.0100 


.0010 


. 000001 


f 1 


.09499 


. 35061 


,2393 


. 04977 


.00777 


.0000175 


f2 


.00485 


. 1183 


. 2763 


.1197 


.0292 


.000149 


f3 


.00016 


.0259 


. 2061 


. 1354 


.0705 


.000312 


f4 . 


0C0004 


.00409 


.1110 


.2077 


.1232 


.003207 


rs 


0. 


2X10""^ 


.00096 


.03 36 


.1182 


.050165 


f 16 


0. 


0. 


0. 


5x10"'' 


.00006 


.027231 


f23 


0. 


0. 


0. 


0. 


0. 


.0000089 


most 


0 


0 




5 


7 


12 



'"most" is the value of n having the highest 
probability.. 



m 



15 f4 .0C0004 .00409 .1110 .2077 . 1232 .003^0/ 




m 
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Table 9: best vgCodon 



10 



15 



20 



25 



Program "Find Optimum vgCodon," 

IIUTIALIZE-MEMORY-OF-ABUNDANCES 

DO ( tl « 0,21 to 0.31 in steps of 0.01 ) 

. DO ( cl = 0.13 to 0.23 in steps of 0.01 ) 

. . DO ( al. = 0.23 to 0.33 in steps of .0.01 ) 

Comnient calculate gl from other concentrations 
. . gl = 1.0 - tl - cl - al 
. . . IF( gl .ge. 0.15 ) 

, DO ( a2 « 0.3? to 0.50 in steps of 0.01 ) 
DO ( c2 » 0.12 to 0.20 in steps of 0.01 ) 

Comment Force D+E = R + K 

g2 = (gl*a2 t • 5 *a 1 *a2 ) / (c H-0 . 5 ♦a 1 ) 

Comment Calc from other concentrations. 

t- 1, - a2 - c2 - g2 

...... IF(g2.gt. O.l.and. t2.gt.0.1) 

CALCULATE-ABUMDANCES 

........ C0MPARE-ABU:JD/u;CES-TO-PREVr0'JS-ONES 

end_IF_block 

end_DO_loop I c2 

. .' . . . . end_DO_loop ! a2 

end_If_block ! if gl big enough 

, . . . end_DO_loop ! al 

. .end_DO_loop ! cl 
. . end_DO_loop i tl 

WRITE the best distribution and the abundances. 



Table 11: Calculate worst codon. 

Program "Find worst vgCodon within Serr of given 
5 distribution," 

INITIALIZE-MEMORY-OF-AOUNOANCES 
Comnent Serr is I error level. 
READ Serr 

Comment Tli, Cli , Al i » Gl i , 72 i , C2 i , A2 i ,G2 i , T3i,C3i 
10 Comment are the intended nt-d istribut ion. ^ 

READ Tli, Cli, Ali, Cli 

READ T2i, C2 i , A2i, C2 i 

READ T3i, G3i 

Fdwn - 1,-Serr 
15 Fup = l.+Serr 

DO ( tl = Tli*Fdwn to Tli*Fup in 7 steps) 

. DO ( cl = Cli*Fdwn to Cli*Fup in 7 steps) 

. . DO ( al « Ali*Fdwn to Ali*Fup in 7 steps) 

. . ! gl « 1. - tl - cl - al 
20 . , IF( (gl-Gli)/Cli .It. -St-rr) 

Comment gl too far below Cli, puch it back 

. . . . gl = Gli*Fdwn 

.... factor ' (l.-gl)/Ctl + cl + al) 

. . . . tl = tl*factor 
25 . . . . cl = cl*factor 

. , . . al = al*factor 

end_IF_bloc.^ 

. . . IF( (gl-Cli)/Cli .gt. Serr) 
Comment gl too far above Cli, push it back 
3 0 . . . . gl = Cli* Fup 

.... factor = (l.-gl)/(tl + cl + al) 

. . . . tl = tl*factor 
• . , . . cl = cl* factor 

. . ... al = al* factor 
35 . . ; . . end_XF_block 

. . . DO ( a2 - A2i*Fdvn to A2i«Fup in 7 steps) 

. . . . DO ( c2 = C2i*.^dvn to C2i*Fup in 7 steps) 

DO (g2=C2i*Fdvn to C2i*Fup in 7 steps) 

-Comment Calc t2 tron other concentrations. 
40 t2 = 1. - a2 - c2 - g2 

...... IF( (t2-T2i)/T2i .It. -Serr) 

Comment t2 too far below T2 i , push it back 

t2 = T2i*Fdwn 

factor « (l.-t2)/(a2 + c2 + g2) 
45 a2=a2* factor 

■ . . c2 = c2 * factor 

g2 = g2*factor 

end IF_block 

IF( (t2-T2i)/T2i .gt. Serr) 

50. Comment t2 too for above T2 i , push it back 

t2 = T2i*Fup 

factor = (l.-t2)/(a2 + c2 + g?) 



mi 

m 



,^1 



i 

r " 

m 

f-.;'; 

i 

te 





10 



15 



20 



25 
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Tablo 11, continued. 

a2 a2 • factor 

c2 = c2* factor 
g2- = q2*factor 
[ , , .end IF block 

IF(g2.gt7 0.0 .and. t2.gt.0.0) 
[ \ ', . t3 = 0. 5* ( 1. -Sorr) 
*''!*.!! g3 « 1. - t3 

■ . CALCUIJ!.TE-ABUNOANCES 

cOMPAnE-ADU::OANC£S-Tp-PREVrOUS-ONES 

t3 = 0.5 
g3 = 1- - t3 
. . CALCULATE-ABUNDANXES 

cOKPAHE-ADUIJOAtXES-TO-PREVIOUS-OHES 

i t3 = 0. 5* ( 1 • *Serr) 

. g3 - 1. - 13 

. . , CALCU LATE-ABUNDANCES 
*■*;... COMPARE-ABUMDANCES-TO-PREVIOUS-ONLS 

. .end_IF__block 
, . . . . . .end_DO_Ioop ! g2 

! ! . . . .end_DO_loop ! c2 

\ . , , . end_DO_loop 1 a2 
. : . .end_DO_loop ! al 
. . . end_DO_l'oop 1 cl 

end DO loop 1 tl 
WRITE~thi WORST distribution and the abundances. 




^ ' ' -' 'T II "f 1 1 - . ' " iia J f •"■nT i - i. - t r'r . '-'i - ■' , 
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Table 13, continued. 

R 3 20 21 22 23 24 25 26 27 23 29 30 31 32 33 

-5 _------------D 

_4 _-, ._---------f: 

,3 _-_---------TP 

-2 2-LZRK---RR-ET 

-1 P-QDDN---QK-RT 

1 RRH HRRIKTRRRCO 

2 RPRPPPNEVHHPFL 

3 KYTKKTGDARPDL? 

4 LAFFFFDSADDFDI 

5 cccccccccccccc 

6 lEKYYKEQrJDDLTE' 

7 LLLLLLLLLKKESQ 
3 HIPPPLPGPPPPPA 
9 RVA AAPKYVPPPPFG 

10 MAE DDEVSIDD YVD 

11 PA PPPTVARKTTTA 

12 GGCCCGGCCCKGGG 

13 RPPRRRPPPS'IPPL 

14 C CCCCCCCCCCCCC 

15 YHKKLNRMR--KRF 

16 DFAAAAAGACQAAG 

17 KFSHYLRMFPTKGY 

18 IIIIMIFTIVVMFH 

19 PSPPPPPSQRRIKK 

20 AAARRARRLAARRL 

21 FFFFFFYYWFFYYY 

22 YYYYYYY FAYYFNS 

23 YYYYYYYYFYYYYY 
2 4 N S N D N N N' ti 0 D K M N N 
25 QKWSPSSCATPATQ 
2 6 KCA AAHSTVRSKRE 

27 KAASGLSSKLAATT 

28 KNKMKHKMGKKCKK 

29 QKKKKKRAKTRFQN 

30 CCCCCCCCCCCCCC 

31 EYQNEQEEVKVEEE 

32 RPLKKKKTLAQTPE 

33 FFFFFFFFFFFFFF 

34 DTH IINIQPQRVKI 

35 WYYYYYYYYYYYYY 

36 S3GGGGGGGRGGGG 

37 G GGGGCGGCGGGGG 

38 CCCCCCCCCCCCCC 

39 GRKPRGGHQDDKKQ 

40 CCCGGGGGGCGAGG 
4 1 N N N N N N N N tJ D D K N N 
^2 SAAAAAAGGHHSGD 
4 3 N S N H K N M H N C G N N 
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Table 13, continuea. 



4 

i 



i 



R Jf 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 



20 21 22 23 
R R R H 



F 
K 
T 

r 

E 
E 
C 
R 
R 
T 
C 
V 
V 



24 25 26 
' U U U 
F 
K 
T 
I 
D 
E 
C 
Q 
R 
T 
C 
A 
A 
K 
Y 
C 



27 28 29 
N K H 



30 31 32 33 
N N H R 



F 
K 
T 
W 
D 
E 
C 
R 
H 
T 
C 
V 
A 
S 
C 
I 



F 
V 
T 
E 
K 
£ 
C 
L 
Q 
T 
C 
R 



F 
K 
T 
E 
T 
L 
C 
R 
C 
E 
C 
L 
V 
Y 
P 



F 

S 
T 
L 

A 



20 Dendroaspis anausticeps (Eastern Green Ma*-aba) 
C13 S2 C3 toxin {DUFT85) 

21 Dendroflsois pclyloois polvlcpes (Black manba) B toxin 
(DUFT85) 

2 2 D ond r Q.^sDis oolvleois colvlcpos (Black Mamba) E toxin 
(UUFxis) 

23 V^parA flr.modvtos TI toxin (DUFT85) 

24 Uioerft ar^norivtos CTI toxin fDUFT85) 

25 Huncarus f^sciatus vril B toxin (DUFT35) 

2 6 Aner.onia sulcata (sea aner.onc) 5 II (L!UFT85) 

27 Ho::io sapiens HI-1-; "inactive" dor.ain ' (Di;FT85) 

28 Horr.o sapiens Hr-14 "active" domain (DUFT85) 

29 beta bungarotoxin 81 (DUFTS5) 

30 beta bungarotox in D2 (OaFTSS) 

31 Bovine spleen TI II (FI0RS5) 

32 T.TChvpleus triricntntns (Horseshoe crab) hemocyte 
inhibitor (NA>0\3 7) 

33 Bor.bvx nori (silkworm) SCI-III (SASAS4) 

Notes : 

a) both beta bungarotox ins have residue 15 deleted. 

b) a . r.o r i has an extra residue between C5 and C14 ; we 
have assigned F and G to residue 9. 

ail natural proteins have C at 5. 14, 30, 3S, 50, « 55. 
all honologues have F33 and C37. 

extra C's in bungarotoxins forri interchain cystine 



c) 
ci) 
e) 



bridges 



te 



m 



ii 

m 



L 



m 




Q 



•■'^•li-'i.-, 




Bin 
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Table 1h: Tally of lonizatlc Groups. 
BPTI honologues . 



Sequence 
Identi f ier 

1 

2 

3 

4 

5 

6 

7 

8 

9 
XO 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 - 



c 



Sequences given in Table 10. 

+ is suni of K + R + NH - 0 - 
molecule at pH 7,0 

S is sum of K + R + i:h D -t 
groups at pH 7.0. 



NH 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 
1 
1 
1 
1 
1 

1 . 

1 

1 

1 

1 

1 ' 

1 

1 

1 

1 

1 



C02 

1 

1 

1 

I 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

I 

1 

1 

1 

1 

1 



+ 

6 
6 
6 

-1 
2 
5 
5 
5 
5 
5 
5 

11 

10 
2 
4 
3 
7 
4 

11 
8 
5 
7 
3 
5 
5 
2 

-1 
2 



} 

16 
16 
16 
13 
16 
15 
15 
15 
15 
15 
19 
19 
18 
14 
16 
19 
21 
8 
17 
20 
13 
13 
15 
17 
13 
16 
11 
14 
22 
23 
16 
13 
17 



E - C02, approxinate charge on 
E + C02, i.e. nu.T.ber of ionised 
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Table 15: 



Anino acids cbccrvcd at each Residue 
BPTI honologues 



i 



Res . 
-5 
-< 
-3 
-2 
-I 
1 
2 



5 
6 
7 
8 

10 

11 

12 
13 
14 
15 
16 
17 
13 
19 
20 

2 1 
22 
23 
2A 
25 
26 
27 
23 
29 
30 

3 1 
32 
33 
34 
35 
36 
37 
38 
39 



Di f f ernet 
AAS 

2 

2 

5 
10 
10 
10 

9 
10 

7 

1 
10 

5 

7 

9 
10 
10 

2 

5 

3 
12 

7 
12 

6 

7 

5 

A 

6 
.2 

4 
10 

9 

5 

7 
10 

1 

7 
11 

1 
11 



Contents 
D -32 
E -32 

T P F 2 -29 

Z3 R3 Q2 T2 H C L K E -18 
D*; T2 P2 Q2 E C M K R -13 
R21 A2 K2 H2 P L r T C D 
F2 0 R4 A2 H2 N E V F L 
D15 K6 T3 R2 P2 S Y C A L 
F19 D4 L3 Y2 12 A2 S 
C33 

Lll f:5 »4 K3 Q2 
L18 £11 K2 S Q 
P26 H2 A2 
P17 A6 V3 
Yll E7 D4 



12 Y2 02 T R 



I L C F 

R2 Q L K Y F 

A2 IJ2 R2 V2 S 



I 0 



T17 P5 A3 R2 X S Q Y V K 
C32 K 

P22 R6 L3 N I 
C31 T A 

K15 R4 Y2 H2 L2 
A22 G5 Q2 



-2 V G A r N F 



R K D F 
R12 K5 A2 Y3 H2 S2 F2 
121 M< F3 L2 V2 T 
111 PIG R6 S2 K2 L Q 
R19 A7 S4 L2 Q 
Y:3 F13 W I 
F14 V14 H2 A rJ S 
Y32 F 

N2e K3 03 S 

A12 S5 Q3 P3 W3 L2 T2 

K16 A6 T2 E2 S2 R2 C 

A18 S3 K3 L2 T2 

C13 KIO N5 Q2 R H M 

LO Q7 K7 A2 F2 . R2 M G 

C33 

Q12 Ell LA K2 V2 Y tl 
P5 K'\ Q3 E2 L2 



L M T G P 



K C 
H V 



T U 



G V S R A 



T12 
F33 

Vll 13 T3 02 N2 Q2 F H P R K 
Y31 W2 
C77 S5 R 
G33 

C31 T A . 
R13 CO K-t Q3 



BPTI 



R 
P 
D 
F 
C 
L 
E 
P 
P 
Y 
T 
C 
P 

c 

K 
A 
R 

T 
I 

R 
Y 
F 
Y 

n 

A 
K 
A 
C 
L 
C 
Q 
T 
F 
V 
Y 
C 
C 



02 P M 
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Table 15: continued. 



Res. 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 



Nur:±»er 
Di Cferent 
AAS 
2 

3 
9 

2 * 

3 

2 

S 

2 

9 

7 

6 

1 

7 

R 

7 

1 

8 

8 

8 

9 

6 . 

3 

2 

2 

2 



Con 
G22 
K20 
All 
• N31 
N21 
F32 
K24 
Tl'9 
All 
E19 
E16 
C33 
R13 
R21 
T2 3 
C33 
G15 
G19 
All 
-24 
-28 
-31 
-32 
-32 
-32 



tents 
All 

Kll D2 

R9 S4 G3 H2 D Q K N 
G2 

Rll K 
Y 

E2 S2 D H V V R 
S14 

19' E4 T2 W2 L2 R K D 
D6 A2 Q2 K2 T H 
D12 L2 M Q K 



MIO L3 E3 Q2 H V 
Q3 E2 H2 C2 G K D 
A3 V2 E2 I Y K 

V8 13 E2 R2 A L S 
V4 A3 P2 -2 R L N 



-10 P3 K3 S2 

G2 Q E A Y S 

Q R I G D . 

T P 

D 

K 

S 



Y2 R 
P R • 



A 

K 
R 
N 
U 
F 
K 
S 
A 
E 
D 
C 
M 
R 
T. 
C 
G 
C 
A 



r-._ - -Xi 



m 

m 
Wi 

f". - -lr~* 



p. 
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n 
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Table 16: Exposure in BPTI 
Coordinates taken fron 

Broo)chaven Protein Data Bank entry 6PTI. 

HEADER PROTEINASE INHIBITOR (TRYPSIN) 13-KAY-87 

COMPND BOVINE PANCREATIC TRYPSIN INHIBITOR 

COMPNO. 2(/BPTIS,CRVSTAL FORM /III5) 

AUTHOR A.WLODAKER 

Solvent radius = 1.40 
Atomic radii given in Table 7 

Areas in Angstroir.s-squared . 

Not Not 
Total Covered covered 
area by M/C fraction at all fraction 



Residue 



ARC 
PRO 
ASP 
PHE 
CYS 
LEU 
GLU 
PRO 
PRO 

TYR 10 

THR 11 

GLY 12 

PRO 13 

CYS 14 

LYS lb 

ALA 16 

ARG 17 

ILE 18 

ILE 19 

ARG 20 

TYR 21 

PHE 22 

TYR 2 3 

ASN 2 4 

ALA 25 

LYS 2 6 

ALA 27 

GLY 28 

LEU 2 9 

CYS 30 

CLN 31 

THR 3 2 



342.45 
239.12 
272.39 
311.33 
241.06 
280.93 
291.39 
235.12 
236.09 
330.97 
249.20 
184.21 
240.07 
237 . 10 
310.77 
209.41 
351.09 
277.10 
278.03 
339.11 
333.60 
306 .08 
333.66 
264 .88 
211. 15 
313.29 
210.66 
186.63 
280.70 
238.15 
301. 15 
251 .26 



205 . 09 
92.65 
158.77 
137.82 
48 .36 
151.45 
128.91 
128.71 
109 .82 
153.63 
80. 10 
56.75 
13C.25 
75.55 
2C0. 25 
66. 63 
243.67 
100.51 
146.06 
144.65 
102.24 
70 . 6^ 
77.05 
99,03 
85.13 
21C. 14 
96.05 
71.52 
132.42 
57.27 
14 1 .SO 
138. 17 



. 5939 
, 3375 
, 5829 
. 4427 
. 2006 
. 5300 
;424 
.5451 
.4652 
.4642 
. 3214 
. 3031 
, 54 26 
, 31o6 
. 6444 
. 3182 
. 6940 
, 3627 
. 5254 
.4266 
. 30G5 
. 230S 
, 2275 
, 3739 
.4032 
. 6399 
. 4560 
, 3823 
0. 4713 
0. 2405 
0. 4709 
C. 5499 



152.49 
47.56 

143.23 
43.21 
0.23 

115.87 
90, 39 
99 .98 
45.80 
79.49 
64 .99 
23 .05 
75.27 
53.52 

102 .00 
4 5.59 

201.43 
53.95 
96.05 
43.81 
69.67 
23.01 
17.34 
33 .69 
48 .20 

202 . 84 
54 .78 
32.09 
93.61 
19.33 
82 . 64 
76.4 7 



0.4453 
0. 1989 
0. 5258 
0. 1388 
0.0010 
0.4124 
0. 3102 
0.4234 
0. 1940 
0.2402 
. 26C3 
1252 
.3136 
.2257 
.6178 
:7 

.5739 
.2127 
0.3455 
0. 1292 
0. 2089 
0.0752 
0. 0512 
0. 1461 
0.2233 
0 . 6474 
0. 2601 
0. 17 13 
0.3335 
0.0612 
0. 2744 
0.3043 



6PTI 




• 2G1 

Table 16, continued. 





3 3 


304 


. 27 


59 


79 


0. 


1965 


18„ 


91 


0 


0622 


VAL 


34 


251 


. 56 


109 


73 


0. 


4364 


42 . 


36 


0 


1684 


TYR 


35 


332 


. 64 


80. 


52 


0. 


2421 


15, 


05 


0 


0452 


GLY 


36 


187 


.06 


11. 


90 


0. 


0636 


1, 


97 


0 


0105 


GLY 


37 


185 


.28 


■ 84 


26 


0. 


4548 


39 


17 


0 


2114 


CYS 


38 


234 


.56 


73 


64 


0. 


3139 


26 


40 


0 


1125 


ARC 


39 


4 17 


,13 


304 


62 


0. 


7303 


250 


73 


0 


6011 


ALA 


40 


209 


.53 


94 


01 


0. 


4487 


52 


95 


0 


2527 


LYS 


4 1 


314 


.60 


166 


23 


0. 


5234 


108 


77 


0 


3457 


ARC 


4 2 


349 


.06 


232 


83 


0. 


6670 


179 


59 


0 


5 145 


ASN 


4 3 


266 


.47 


38 


53 


0. 


1446 


5 


32 


0 


0200 


ASN 


•\A 


269 


.65 


91 


03 


0. 


3373 


2 3. 


39 


0 


0367 


PHE 


45 


313 


v22 


69 


73 


0. 


2226 


14 


79 


0 


0472 


LYS 


46 


309 


.83 


217 


18 


0. 


7010 


155 


73 


0 


5026 


SER 


47 


224 


.78 


69 


11 


0. 


3075 


24 


80 


0 


1103 


ALA 


48 


211 


.01 


82 


06 


0. 


3889 


31 


07 


0 


1473 


CLU 


49 


2S6 


.62 


161 


00 


0. 


5617 


100 


01 


0 


2439 


ASP 


50 


299 


.53 


156 


42 


0. 


5222 


95 


96 


0 


3204 


CYS 


51 


233 


.63 


24 


51 


0. 


1027 


0 


00 


0 


0000 


MET 


52 


293 


.05 


89 


43 


0. 


3054 


66 


70 


0 


2276 


ARC 


53 


356 


.20 


224 


61 


0. 


6306 


139 


75 


0 


53 27 


THR 


54 


251 


.53 


116 


43 


0. 


4629 


51 


64 


0 


2053 


CYS 


55 


240 


,40 


69 


95 


0. 


2910 


0 


00 


0 


0000 


GLY 


56 


184 


.66 


60 


79 


0. 


3292 


32 


78 


0 


1775 


GLY 


57 


106 


.53 


49 


71 


0. 


4664 


33 


2S 


0 


3592 


ALA 


SO 


no 


position 


given 


ir. Protein Data 


Bank 



'*Total area*' 



•\*lot covered 
by M/C" 



"Not covered 
at all" 



is the arc.^ measured by a rolling sphere 
of radius 1.4 A, where only the atoizs 
within tho residue are considered. This 
takes account of conf orr:a t ion. 

is the area r.easured by a rolling sphere 
of radius 1.4 A where all -r.ain-chain atoms 
are considered, fraction is the exposed 
area divided by the total area. Surface 
buried by nain-chain ator.s is nore 
definitely covered than is surface covered 
by side group atons. 

is the area neasurcd by a rolling sphere 
of radius 1.4 A where all atons of the 
protein are considered. 





t 

7 



/ 



Table 17: 



Phage 
LGl 

pLC2 

pLG3 
pLC4 



pLC5 
pLG6 ■ 
pLG7 

pLca 

pLC9 
pLClO ■ 
oLGll' 



I". 2 

PlasnidG •j:;cci ' in-Dccailcd ExonpJ.c 

f 7nnter.tS 

MX3npl8 with AV:i II/A^.H/Acc I/a^C 
ir/Sau I adaptor 

LCI uich ari2^' ^nd ColEl of .paR:22 clcned 

into Ant II/A:zc I sites 

pLC2 with /iTC I site rcmov<2d 

pLC3 with firnt part of g^^zJ^ gene 

cloned into YSlL H/Sau I sites, 

Ave II/Asu ir sites created 

pLG4 with second part of psp-P^^d . ^ene 

cloned into ^vr II/Asu II sites, HsbH I 

site created 

pLGS with third ^SLXix, of c sp-pbd ^cne 
cloned into Asu II/BisH I sites. Rbe I 
site created 

pLG6 with last part of o?-p-:^V ^ ^^'^c 
cloned into VJoji I/Asu.H sites 
pLX37 viTih disabled osp-pM ge^e, su-e 
Length DI^A. 

pLG7 nudtcd to display DPTI ( VI Sf^pxi ) 
ir.? ^ gene 



pLCa + tet ^ gene - < 
pr>G9 + tet^ gcr-e - *3cne 



V : - - 

m 



li 



U 

fw't .: 

i 

' "T 

m 




c 




i 

r 

/. < 
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Table 13: Enzy-e sites elininatcd when 
M13r.pl3 is cut by Ava 11 
and Br.u3 6 I 



10 



15 



20 



25 



30 



35 



40 



Aha II 
Fsp I 
EcoR I 
Sna I 
Hind III 
Hind II 



Aat II 
- Ebv II 

BstS I 

Eco57 I 

Esp I 

Khe I 

Pf IM I 

EST I 

Spe I 
' Xca I 



Kar I 
Bgl I 
Sac I 
BanH I 
Acc I 



Gdi II 
HqlE II 
Kpn I 
Xba I 
P5it I 



Pvu I 
Bsu36 I 

Sa I I 
Sph I 



Table 19: Enzynes not cutting 
M13inpl3 



Af I I 
Bcl I 
BstE II 
Ecoff I 
Hpa .1 
Not I 
PmaC I 
Sac I 
Stu I 
Xho I 



Apa I 
DspM I 
BstX I 
ECOO109 I 
Mlu I 
N'ru I 
Ppa I 
Sea I 
Sty t 



r 

Ec_23 V 
F::-jM I 

sfi r 





264 



10 



15 



Aat II 
Sea I 
Pvu I 
Hind II 
Kde I 



Tabic 20: Enzyncc cutting 
Anp R gene and ori 



fibv II 
Tthlll I 
FGB I 
Pst I 



Eco57 I 
Aha II 

Xbft I 



Ppa I 
Gdi II 
HaiE II 
Af { III 



■V-.. ' • 



m 



m 

1^ 



I 



T.ble :i: Enr.yr.os « ==ed on A^.-iq CNA 



£nzyTr.e 
^Acc I 
Afl II 
Apa I 
Asu II 
Avj III 



10 



25 



Recoani wion 
GTMKAC 
CTTAAG 
GGCCCC 
TTCCAA 
ATCCAT 



Avr ir 

33r.H I 
BCl I 
3r»cM i: 
assH II 

+Dr3 i: 
I 

gc::5 I 

+ I 
Hir.d III 
HPfl I 
Kpn - I 



I 

I 



i5 



Hoc I 

pr^aC I 
+ Pm^ I 
+Rsr ri 
Sac I 

sal I 
+Sau I 

.5X1 I 
Sjra I 
Spe I 
So^ I 
St;j r 



CCTAGG 
CGATCC 
TGATCA 
TCCGGA 
CCGCGC 

ggt::acc 
ccakiisnn 
rgg:xcv 

GAATrC 
GATATC 
GCTSACC 
AAGCTT 
CTT-^C 
GGTACC 

ACGCGT 

GCCGCC 

CCATCG 

GCTAGC 

GCGGCCGC 

TCGCGA 

cca::!^nn:i 

CACCTG 
RGGWCCY 
CGG*.''CCG 
GAGCTC 

GTCGAC 
CCTN'ACC 



ACTAGT 
CCATGC 
AGGCCT 
CCV«GG 
GTATAC 



Synvn 
P 
P 
P 
P 
? 

P 
? 
P 
? 
? 
P 
p 

P 
? 
? 
P 
? 
P 
? 
F 



cuts 
Z i 

1 £. 
5 4 

2 i 
5 i 



P 
P 
P 
? 
P 
P 
P 
? 
P 
P 
P 

? 

V 



1 

8 
?. 
5 
1 

2 
I 

: c. 

5 Si 



5 i 

1 u 

2 & 



4 
5 
1 

1 

S 
3 
5 
5 
5 
6 

4 

3 

o 

5 
3 
5 
S 

z 
I 

5 

5 
5 



Supply 

<a,M. r.N, ?,T 
<n 

^7; I:M.N/P.T: 
IcPli: I:T. 



<S. 3,>'. 
<S . 3,M 

<:) 

<:i,T 

<S,3.M.N,T 

<M.T ; 
<•! (soon) 
<S.3.M. 
<3,3.M, I.N 



r^cpOl09 I :N 



P.T 



<S 3,M. I.N, ?,T 
.;S,3.M. r.N,? ,T 

<s,3,y.. r.N,?,T ; 

<M,:.'. P.T 
< 3 , :<\ r 

cM.N, ?.T 
6 < . N . ? . 
3 <3.^<.N,7 

; <:i 
3 <ncne 
5 <U 
5 <N,T 

L <3(^ I). M.I.N,?. 
T 

5 <B.M..''.N,?.T 
5 •-.M; -3' 

:T; ^ 

5 <.Y,N.r 

L <3.y..i.N.?.r 

3 N,:tA^ I) 

5 <N,P.T 
2 <ii;sccn) 



Table 22: lobe: gene 



pbd modlO 2911168 : 

lacUVS Rsr Il/^vr ir/genc/T£a^ a ttenua tor/Jlst 11; ! 
5'- CGC.iCCC TaT - Est II site 

CCAGGC tttaca CTTTATGCTTCCGCCTCG tataflt GTG * ! lacCA'S 



TGG aATTGTCAnrCGATAACAATT 
CCT ACGAqq CtcaCT 



atg 


aag 


aaa 


tct 


ccg 


gtt 


ett 


aag 


get 


age . 


10, 


gtt 


get 


gte 


geg 


ace 


ctg 


gta 


ceg 


atg 


ctg ! 


20 


tct 


ttt 


get 


cgt 


ceg 


gat 


tte 


tgt 


etc 


gag . 


30 


ccg 


cca 


tat 


act 


ggg 


ccc 


tge 


aaa 


geg 


egc . 


40 


ate 


ate 


cgt 


tat 


tte 


tac 


aac 


get 


aaa 


gea . 


50 


ggc 


ctg 


tgc 


cag 


ace 


ttt 


gta 


tac 


ggt 


ggt . 


60 


tgc 


cgt 


get 


aag 


cgt 


aac 


aac 


ttt 


aaa 


teg . 


70 


gcc 


gaa 


gat 


tgc 


atg 


egt 


ace 


tge 


ggt 


ggc . 


30 


gcc 


get 


gaa 


ggt 


gat 


gat 


ceg 


gee 


aaa 


geg 


90 


gcc 


ttt 


aac 


tct 


ctg 


caa 


get 


tct 


get 


ace 


100 


gaa 


tat 


ate 


ggt 


tac 


gcg 


tgg 


gcc 


atg 


gtg 


110 


gtg 


gtt 


ate 


gtt 


ggt 


get 


ace 


ate 


ggt 


ate 


120 


aaa 


ctg 


ttt 


aag 


aaa 


ztt. 


act 


teg 


aaa 


geg 


130 


tct 


taa 


tag 


tga 


qnttacc 


r 


Bstn: II 







lacO opersator 
Sh ine-D-a li,a rno seq. 
.3 leader 



agtcta agecege ctaatga gcgggct tttttttt 
CCTgAGG -3'- 1 Mst II 



tem ina tor 



I 



1 

■J 



-.1 



o 
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Table 23: iobd DNA sequence 

DNA Sequence file =» UV5_ML3PTrM13 . DfJA; 17 
DNA Sequence title = 
pbd raodlO 29III83 : Iac-UV5 Rs r I I/Avr I I/genc/TrpA 

actcnua tor/HscI I ; ! 

1 C|GCA|CCC!TAT|CCA|GCC[TTTl ACA|CTT|TAT|GCT|TCC|GCC|TCC| 
41 TAT| AAT|GTG|TCG|AAT|TGT|CAG|CGG| ATA| ACA|ATT|CCT(ACGlAGC| 
83 CTC| ACT|ATGi AAG| AAA|TCT|CTGlGTT|CTT| AAG|GCT[AGClGTTiCCT! 

12 5 CTC I GCG 1 ACC I CTG I CTA j CCG 1 ATC t CTC j TCT | TTT | CCT \ CCT [ CCC | C AT ! 

167 TTC|TGTlCTC|GAG|CCG|CCA|TATl ACT[GGC|CCC[TCC|AAA!GC3tCCC! 

2 09 ATC I ATC | CGT | TAT | TTC | TAG I >J.C | GCT 1 AAA | GCA | GCC | CTG ] TGC I CAG j 
251 ACC|TTT.|CTA|TAC|CGT|GCTiTGC!CGTlGCTl AAG|CGTj AAC|AAC|TTT| 
293 AAA I TCG I GCC I CAA I GAT I TGC 1 ATC I CCT I ACC I TGC I CGT I GGC I GCC j CCT! 

3 35 CAA|GGT|GAT|CAT|CCGlGCC!AA.\iGCG;GCC|TTT|AAC|TCT]CTGiCAA| 

3 7 7 GCT I TCT | CCT | ACC [ G/J^ | TAT | ATC | GGT | TAG | GCC | TCG | GCC i ATG \ GTC I 

4 19 GTG|GTT| ATC|CTTlGCT|CCT|ACCt ATC|GGT| ATC| AAA|CTGtTTTlAAG| 
4 61 AAAjTTTlACT|TCG|AAAlGCCtTCTlT.\A|TAG|TGAiGGT|TAC|CAG|TCT! 
50 3 AAG I CCC I GCC | T/>A | TGA 1 GCG | CC C I TTT | TTT | TTT | CCT j GAG | G 



Total 



539 bases 



n 

m 

P 

li 

m 
M 



1 




0 



Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
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Tabic 2-i: Sur.-ary of Restriction Cuts 

Enz - %Acc I has 1 obGcr/cd sites : 259 
Enz = Acc III hA3 1 observed sites : 162 
Enz « AcY r has 1 observed sites : 328 
Enz = AXi II has 1 obser-^ed sites : 109 
Enz = %Ari in ^--^s 1 observed sites : 404 
Enz = Aha III has " 1 observed sites : 292 
Enz = a^ii^ I has 1 observed sites : 193 
1 observed sites : 133 
I observed sites : 47 1 
I observed sittis : 175 
1 cbscn.-ed sites : 76 
3 obser%-ed sites : 1 38 328 540 
1 observed sites : 328 
1 obccr"/ed sites : 352 
1 observed sites : 346 

• ii-±iLi - I obser'-'ed sites : 319 

Enz = BsSh^II has 1 observed sites : 205 
v gstE II has 1 observed sites : 4S3 
%3stX I has 1 obser-ved sites : 413 
Cfr 1 has 2 observed sites : 299 350 
+ Dra II has L observed sites : 193 
-t-Eso I has 1 obsor-^ed sites : 277 

1 observed sites : 213 

2 obser-^ed sites : 299 350 
1 obser-ved sites : 240 

I obser'^ed sites : 323 
1 obser'/ed sites : 473 

^ _ .3 obser-zed sites : 133 323 54 0 

tHqiJ II has 1 observed sites : 103 
Hind III. has 1 observed sites : 377 
^ij^ I has 1 obser^'ed sites : 340 
>:pn I has 1- observed sites : 133 
+tLbo II has 2 obser/ed sites : 93 304 

Mlu I has 1 observed sites : 404 

Enz = Mar I has 1 cbser/ed sites : 323 
Enz = Kco I has 1 otscr-zed sites : 413 
Enz = Uhe I has 1 observed sites : 115 
Enz = Mru I has 1 obscr^/ed sites : 123 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 



Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 




N5Df7 524) has 1 obser-zed sites : 311 
?;3cO II has 1 observed sites : 332 
+ Pf IM I has 1- obser\'ed sites : 134 

*£ss I has 1 obscr\-ed sites : 193 
+ Rsr II has 1 obscr/ed sites : 3 

vs^u I has 1 observed sites : 535 
i SfaM I has 2 observed sites : 144 209 
+5X1 I has 1 observed sites : 351 

Soh I has 1 observed sites : 311 

Stu I has 1 observed sites : 240 

^ Stv I has 2 observed sites : 76 413 
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■ ■;Vtr.:V-^'*'j;:*\. •.ij:*'■?v='V^^:^^^;;; :"V--r;-/:v:*'-^' -;; '••* rfr^-". 



Enz 
Enz 



JC.ca I 
Xho I 



has 
has 



Table 24, continued. 

1 observed sites : 
1 observed sites : 



259 
175 



Enz = Xaa III has 1 observec sites 



299 



Enzynes that do not cut" 





Tthl 



n 




Ava III 
3cl I 
Dra III 
HqlA r 
Nde I 
Pst I 
Sal I 

SSD I 

Xr.a r 





c 
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Table 25: Annotaccd Sequence of IqM qene 

5'- CtGGA|CCCtTAT|CCAiCCC[TTT|ACA|CTT|TATl 
I Rsr TI I J Zjn L 

I GCT I TCC I CCC I TCG 1 TAT i AAT | CTG I TCC I 
1 -10 L 



I .VAT 1 TCT [ GAG I CCC I ATA | ACA ! ATT I 
I Uic operator — L 



|CCT|ACC|ACC|CTCl ACT| 
I Avr III 



1 

ATG 



k 
2 

AAG 



V a 
I ll| 12 
(GT^CCT 



k 1 s 


1 


V 


I 




I 


s 


3 -1 


5 


6 


7 






10 


AAA 1 TCT 


CTC 


GTT 




A.-\G 


CCT 


AGC 








Afl IT 


fi^e I ; 


V 1 a 


t 


1 


V 


P 


n 


So! 


13 14 


IS 


16 


17 


13 


19 




CTC CCC 


ACC 


CTC 


CTA 


CCC I ATG 1 CTC i 


1 Nru 11 


1 


Kun 









s 

21 
TCT 



P 
31 
CCG 



C 

22 
TTT 



P 

32 
CCA 



a 

23 

ccr 



y 

33 
TAT 



r 

24 
CCT 



P 

25 
CCC 



[ Accirii 



26| 27 
GATjTTClTCT 



1 




29 


Jol 


CTC 


GAG 1 


Av.-^ T i 


x' ho r ! 



28 



52 



73 



83 



118 



148 



178 



t 
34 
ACT 



p I c I k 
3g| 37: 33 
CCCjTGCl AAA 



PflM I 



g 

35 
GCG 

_L 

A Pel T I 
Dra 11 I 
Pss I 1 



a 

3 9 
CCG 



r I 
40! 



RssH :rl 



i 


1 


r 1 y 


t 


y 


n 


a 


41 


42 


43 44 


45 


46 


47 


46 


ATC 


ATC 


CCT 1 TAT 


TTC 


TAG 


A.\C 


CCT 



k 

491 
Aj\A| 



Si 







/ 




3 
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Toble 25, con-inucd. 



a 


9 


1 


50 


51 


52 


CCA 


CCC 


CTC 


1 


Stu 


II. 


c 


r 


a 


61 


62 


61 


TCC 


CCT 


CCT 








s 


a 


c 


70 


71 


72 


TCC 


CCC 


CAA 


1 Xnartr 1 


g 


a 


a 


80 


81 


B2 


CCC 


CCC 


CCT 


Rbe I 




Mar I 





c 
53 
TCC 



y. 

64 



q 

54 
CAC 



r 

65 
CCT 
1 



t 

55 
ACC 



661 



d 
73 
CAT 



e 
83 
CAA 



C 

74 
TCC 



75 
ATC 
SDh Tl 



f 


V 


y 


56 


57 


53 


TTT 


CTA 


TAC 




-ACC r 




Xca I 


n 


f 


k 


67 




69 


/xAC 






r 


t 


C 


76 


77 


78 


CCT 


ACC 


TCC 



59| 60 
CCT I CCT 



268 



79| 
CCT I 



295 



325 



84 
CCT 



I P 
87 

CCC 



a I Jc I a 
88| 39 90 
CCC I ;w\A I CCC 
Sfi I ' 



d 

85 
CAT 



a 

91 
CCC 



d 

86 
CAT 



;*".v..v!: 

t:::f:Z 





Table 25, continued. 



i 


V 


g 


& 


c 


i 


g 1 i- 


113 


114 


115 


116 


117 


113 


119 1 120 


ATC 


CTT 


CGT 


CCT 


ACC 


ATC 


CCT 1 ATC 



k 


1 


f 




k 1 f 


121 


122 


123 


124 


125| 126 


AAA 


CTC 


rrv 


/vAC 


AAAITTT 



t I s k a 
127 128 129| 130 I 
ACtItCc! AAAtCCcI 



s 








131 


132 


133 


134 


TOT 


TAA 


TAG 


TCA 



CCT ! TAC I CAC I TCT I 

sstE rn 



502 



I AAC I CCC ! CCC [ TAA I TCA | CCC | CCC ] TTT [ TTT \ TTT 1 
I Trp tern ino tor i 



532 



ICCT|CAC|C -3' 
I sail T L 



539 



Note the following enzyne equivalences, 

i;na III = E^g I 
Acc III = BspM II 

Pra II = EC00109 I 
Asu II ~ BstS I 
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rable 26: DliA^seql 



5' [ccg|tCciqtC|CCAlCCC|TAT|CCA|GCC|TTT| ACA|Ci-riTAT| 
spacer I Rsr II I J -3^ 



I CCT I TCC I GCC I TCC I TAT | AAT I CTC j TCC | 

' I -io I 



I AAT I TGT t GAG | CCC j ATA | ACA | ATT 
I lac operator 



jCCT|ACG 







a 


128 


129 


130 


TCG 


i\AA 


CCG 



jgccIgcttccT 
. I spacer IAsj til 



1^ 



m 



^■■■iS. 



m 



s 








131 


1:2 


.33 


134 


TCT 


TA^ 


TAG 


TGA 



GGT! TAG I GAG I TCT I 

Bsts in 



I AAG I CCC ( CCC I TAA | TGA j CCG j CGC | TTT | TTT | TTT | 
I Trp tern;nator L 



[CCT|CAG|Gca|qgt|gag|cg - 3' 
I Sau I I spacer [ 




Table 27: Dt/A_synChl 



5' I err. I Tcc I err \ cga I CCG \ tat i cca ( ccc I ttt t aca I ctt i tat I 



I ccr \ Tcc 1 r .r.r | tcc I ta t | aat I ctc I tgc I 



! AAT I TCT I GAG I CCG I AT A ■ ACA | ATT { 



CCT ! AGC 1. 
gqa tcc 

/ 3' - oliq.<3 
I CCC I GCT ! CCT 1 TCG ■ AAA j CCG | 
cqq cga gqa age etc cgc 



I TCT ! TAA 1 TAG | TGA 1 GGT j TAC j CAC | TCT [ 
aga att ate act cca atg gtc aga 



|AAGlCCC|CCC|TAA|TGAiGCG!GCC!Trr|TTT[TTT; 
etc egg egg att act cgc ccg o.aa aaa aaa 



I CCT 1 CAC I GCA I GCT | GAG | CG 
gga etc cgt cca ctc yc - 5' 



"Top" strand 09 

"Bottom" strand 100 

Overlap 2 3 

Uct length 158 



(14 c/g and 9 a/ t.) 



Table 29: DMA_synth2 
5'- ^GC^ICCAl ACCi 
ICCTI ACG I A GClCTCl ACT! 
I ATC I AAG I A.AA I TCT I CTG t CTT I CTT ■ AAG I GCT I AGC I 



I GTT I GOT I GTCl GCG I ACC I CTG I GTA I CCG 1 ATC I CTG I 
olig?6 = 3'- ggc tac gac 



/ 3' = olig^S 
[ TCT [ TT T I GCT I CGT I CCG I CAT I TTC | TGT | CTC \ GAG | 
aga aaa cga gca ggc cca aag aca gag etc 



I CCG t CCA I TAT | ACT (CGG | CCC | TGC | A^-^^ 1 GCG | CGC | 
ggc ggt ata tga ccc ggg acg ttt cgc gcg 



I ATC] ATC I CGT I 
tag tag gca 



I ACT ! TCG I i G CG I GCT 1 G CG I 
tga age tct cgc cga cgc - 5' 



"Top" strand 
"Bottom" strand 
Overlap 
Net- length 



99 
99 

24 (lA c/g and 10 a/t) 
155 



1: 



i 



P 




Table 30: DfIA_seq3 



I ccc| tqc| aca 
I spacer 



a 

39 
GCG 



r 

40 
CCC 



i 


i 


r 


y 


f 


41 


42 


43 


44 


45 


ATC 


ATC 


CGT 


TAT 


TTC 


a 


g 


1 


c 


q 


50 


51 


52 


53 


54 


GCA 


CGC 


CTC 


TGC 


CAG 


1 


StU 


II 






c 


r 


a 


Jc 


r 


61 


62 


■63 


64 


65 


TGC 


CGT 


GCT 


AAG 


CGT 








1 


s 


a' 


e 


d 


c 


70 


71 


72 


73 


74 


TCG 


GCC| 


GAA 


GAT 


TGC 



461 47 
TAC I A^\C 



t 

55 
ACC 



a 

48 

GCT 



k 

49 
i\AA 



f 


V 


y 


<? 


g 


56 


57 


58 


59 


60 


TTV 


GTA 


TAC 


GCT 


GCT 




Acc T 








Xca I 







n n 
66 I 67 
AAC| AAC 



f 

68 
TTT 



|XmaIir| 



SI 

75 
ATG 



! soh ri 



76 77 
CGT ACC 



k 
69 
AAA 



c q 
73 79 
TGC GCT 



g 


a 




80 


81 




CGC 


CCC 


get 1 gaa 


Bbe I 


soacer 


Nar r 







t 


s 


k 




127 


128 


129 


Ittt 


acT 


TCG 


AAa 



I. 

r :.■ - 

i 




r 
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Table 31: DHA_synth3 
5 * - I CCC I TGC t ACA 1 CCC I CCC I 
) ATC i ATC I CGT i TAT ' TTC I TAC ! AAC ( GCT I A AA I 



I CCA I CCC I CTG I TGC = CAG I ACC I TTT ! CTA I TAG | GOT ! GGT I 
oiigiS = 3'- q cca cca 



/ 3' « oligS7 
f TGC I CGT ! GCT I .AAG : CGT I AAC I AAC 1 TTT | AAA [ 
acq gca cga ttc gca ttq tcg aaa ttt 



I TCG I GCC [ GAA | CAT I.TCC [ ATG ( CGT | ACC [ TGC | GGT | 
age egg ctt eta acg cac gca egg acg cca 



I CCC [GCC I GCT I GAA I 
ccg egg cgt etc 



I TTT 1 ACT ] TCG I AAA I CCG i TCG ] CCC ; 
aaa tga age ttt cgc age gge -5' 



"Top" strand 
"Bottoa" strand 
Overlap 
Net length 



93 
97 

25 (15 g/c & 10 a/t) 

146 





Q 



0 



so 



Table 32: DKA_seq-; 









g 


a 


a 


e 


g 


5' 






80 


81 


82 


S3 


84 


1 cct 1 cgc| cct 


GGC 


GCC 


GCT 


GAA 


CCT 


1 soacer 


Bbe I 














Uar T 










P 


a 




a 


a 








87 


88 


89 


90 


SI 








CCG 


GCC 


AAA 


CCG 


GCC 







d I d I 
8S| 861 

GAT [cat; 



I Sfi I 



m 



ii 



m 





t 

92 
TTT 



e 
101 
GAA 



y 

102 
TAT 



a Si 
icq] 109 
CCC| ATC 



93 
AAC 



1 

103 
ATC 



V 

110 
GTC 



s 

94 
TCT 



g 

104 
CCT 



1 

95 
CTC 



a a 

9*6 I 97 
CAA CCT 



98 

rcT 



99 
CCT 



t 
lOO 
ACC 



[Hind 31 



Y 
105 
TAC 

1 



a 
106 
GCC 



w 
107 
TGC 



111 i 112 
CTG i CTT 



BstX I 



N'co I! 



1 
113 
ATC 



121 
AAA 



V 

114 
CTT 



1 

122 
CTG 



f 
123 
TTT 



a ! c 


i 


g 


1 i 




116! 117 


118 


119 


!l20 




GCT 1 ACC 


ATC 


CGT 


lATC 




k 1 k 


f 


t 


t s 


k 


124 1 125 


126 


127 


i 123 


129 


AAC I A>JK 


TTT 


ACT 


jTCC 


AAa 






|Asu TTl .5 



qcgi 



1^ 

m 

m 
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Table 33: D::A_synth4 
5' I OCT I CGC I CCT'I CCC I GCC I CCT I CAA \ OCT I CAT ! GAT ' 
ICCGlCCCi AAAlCCGlCCCt 

I TTT I AAC I TCT ( CTC [ CAA I CCT ! TCT ' CCT I ACC I 



I GA A I TAT ' AT C t CGT I TAC ' CCG I TCC [ 
oligiflO ~ 3'- ata tag cca atg cgc acc 



/ 3 ' = oliqfO 
|GCC( ATGlG TG | CTC | GTT | 
egg tac cac cac caa 



I ATC [ GTT I GGT I GCT [ ACC \ ATC \ GGT | ATC | 
tag caa cca cga tgg tag cca cog 



I AAA I CTC I TTT | AAC | AAA | TTT \ ACT | TCG | AAA I CCC ! TCT \ TCA 1 
ttt gac aaa ttc ttt aaa tga age tct cgc aga act - 5' 



"Top" strand 100 
"Bottom" strand 93 

Overlap 25 (14 c/g ar.d 11 a/t) 

Net length 149 



m 





A 



1 



Res . 

3 



-4 
-3 
-2 
-1 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 - 
.3 0 
31 
32 
33 
34 
35 
36 
37 
38 
39 



0 
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Table 34: Some interaction sets in BPTI 



Nurobe] 
Oif f . 
AAs 



Contents 



BPTI 12 3 4 5 



2 

2 
5 
10 
10 
10 

9 
10 

7 

1 
10 

5 

7 

9 
10 
10 

2 

5 

3. 
12 

7 
12 

6 

7 

5 

4 

6 

2 

4 
10 

9 

5 

7 
10 
- 1 

7 
11 

1 
11 

2 

3 ' 
1 
3 
7 



E -18 
R -18 



0 -32 
E -32 

T P F Z -29 
Z3 R3 Q2 T2 H C L K 
D4 T2 P2 Q2 E C N K 
R21 A2 K2 H2 P L r T C D 
P20 R4 A2 112 N E V F L 
015 K6 T3 R2 P2 S Y C A L 
F19 04 L3 Y2 12 A2 S 
C33 

Lll E5 H4 K3 Q2 12 Y2 02 T 
L18 Ell K2 S Q 
P26 H2 A2 I L G F 
P17 A6 V3 R2 Q L K Y F 
Yll E7 D4 A2 N2 R2 V2 S 
T17 P5 A3 R2 I S Q 
G32 K 

P2 2 R6 L3 N I 
C31 T A 



I D 



Y V K 



KIS R4 Y2 H2 L2 -2 V C A I N F 

A22 G5 02 R K D F 

R12 K5 A2 Y3 H2 S2 F2 L M T C P 

121 m F3 L2 V2 T 

111 PIO R6 S2 K2 L Q 

R19 A7 S4 L2 Q 

Y18 F13 W I 

F14 YI4 H2 A N S 

Y32 F 

N2G K3 D3 S 

A12 S5 Q3 P3 W3 L2 T2 K G R 

K16 A6 T2 E2 S2 R2 C H V 

AlB S8 K3 L2 T2 

G13 KIO M5 Q2 R II M 

L9 Q7 K7 A2 F2 R2 M G T N 

C33 

Q12 Ell L4 K2 V2 Y N 

T12 P5 K4 Q3 E2 L2 G V S R A 

F3 3 

Vll 18 T3 D2 »2 Q2 F H P R K 
Y31 W2 
G27 S5 R 
G33 

C31 T A 

R13 C9 K4 Q3 02 P M 



R 
P 
D 
F 
C 
L 
E 
P 
P 
Y 

C 
P 
C 
K 
A 
R 
I 
I 
R 
Y 
F 
Y 
N 
A 
K 
A 
C 
L 
C 
Q 

T 
F 
V 
Y 
C 
G 
C 
R 



m. 
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Table 34: continued. 



Number 
Res. Diff, 

? AAs Contents 



BPTT 12 3 4 5 



40 

A 1 

42 

43 

44 

45 

46 

47 

43 

49 

50 

51 

52 

53 

54 

55 

56 

57 

53 

59 

60 

51 

62 

63 

64 



2 
3 
9 
2 
3 
2 
8 
2 
9 
7 
6 
1 

7 

8 

7 

1 

8 

3 

8 

9 

6 

3 

2 

2 

2 



C22 
N20 
All 
N31 
U21 

r32 

K2 4 
T19 
All 
E19 
E16 
C3 3 
R13 
R21 
T2 3 
C33 
CIS 
C19 
All 
-24 
-23 
-31 
-32 
-32 
-32 



All 

Kll D2 

R9 S4 C3 H2 



D Q K N 



G2 

Ril K 
Y 

E2 S2 D H V Y R 
S14 

19 E4 T2 V/2 



D6 A2 Q2 K2 
D12 L2 M Q K 



L2 R 
T H 



K D 



MIO L3 E3 Q2 H V 
Q3 E2 H2 C2 G K D 
A3 V2 E2 I Y K 



V8 
V4 



13 E2 R2 
A3 P2 -2 



-10 P3 K3 S2 

G2 Q E A Y S 

Q R I G D 

T P 

D 

K 



. L S 
: L ti 
Y2 R 
P R 



A 
K 
R 
N 
N 
F 
K 
S 
A 
E 
D 
C 
M 
R 
T 
C 
G 
G 
A 



s 

2 s 
2 



s 5 
4 s 
s 5 
s 
s 
s 
5 
5 
s 
s 
5 

X 

s 
5 
5 

X 



indicates secondary set 

indicates in or close to surface but buried and/or 
highly conserved. 
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Table 36: UisianccG, Of^I residue sec ^2 
Distances in Angstrons between C^^q^^s. 
Hypothetical Cbeta '''^^ added to each Glycine. 



r 



119 
V21 
A27 
C2S 
L29 
Q3L 
T32 

A4 8 

y.52 



R17 
7.7 



119 Y21 A27 C28 L29 Q31 T32 



V3 4 A4 3 



?9 

Til 

K15 

A16 

113 

R20 

r2 2 

►12 4 

K26 

C30 

F33 

Y35 

S47 

D50 

C51 

R53 

R39 



15. 
22. 
26. 
22 . 
16. 
11. 

13 . 



8 , 4 
17 . 1 
20.4 
15.8 
10. 4 
5.2 
6.5 
11.0 



22.0 14,7 
23.^ l^^.3 



12.2 
13.3 
9.6 
6.3 
6-1 
11.6 
5. 4 
8.9 
e . 6 



5.2 
10. 6 
15.5 
21.7 
13 . 3 
16. 1 
10. 3 



6.3 
10.9 
13 . 0 

3.4 
12.2 

7 ■ 6 



5.4 
11.4 

8 . 3 
13.9 
11.3 



3 . 2 

a. 3 15.7 
13.3. 19.3 

1 : . : :o. 0 



5.5 

^ . 2 



14 .0 11.3 
9.5 11.2 
7.9 14.6 20.1 27. 



9.0 12.2 15 . 
13.5 19.8 22. 



5.5 10. 1 
6.1 6.0 



10. 6 
15.6 
10 .9 



15.9 
11.2 
5.4 
5.6 
9.4 



5.9 
10.9 
14 .7 
24.4 20.1 15.2 
^18.9 12 . 1 4.6 
7 . 4 
7.4 
10. 6 
. 6 



10.3 
8.4 
17.6 
20.0 13 
18.9 12 
25.4 18 
15.4 16 



7.7 
9.4 
6.6 
7.2 
4.0 12 

11.0 17 

17.1 24 



31 
23 . 
24 
13 



12.3 
7 . 2 
7.7 
9 . 5 

16.4 

21.4 

17.9 

16 

12 

15 

27 



13 . 
19. 
27. 
24 . 
20. 
M . 
10. 

6. 

9 . 

it . 

12 . 
17 . 
12. 

13 . 
3 . 

13. 
24 . 



0 . 2 

12. 1 

13. J. 
14 . 5 



7.9 
13.5 
21.4 
13. 6 
14 . 7 

9.8 

C. 2 

4.3 
10. 1 

5.9 

6.6 

12.2 9.5 5 
12.6 10.4 15 
13.5 12.9 17 
9.7 15 



10 . 

6 . 

3 . 
10. 
15. 

3 . 

5. 

9 . 



8 . 7 
5.7 
10. 3 
3 . 6 
7.0 
7 . 3 

:c.3 

14 . 7 
19.0 
14.9 
5.5 



8.3 
15.7 
20. 1 




22.3 





» • * ■ 

1/ 






17.7 15.5 

22.1 21.5 
27.5 28.7 

22.2 24.2 



17.4 
13.0 



19.5 
13.8 



Distances in 
Hypothetical 
E49 M52 

.)m 6^ 

P9 
Til 
K15 
A16 
118 
R20 
F22 
U2 4 
K2 5 
C30 
F33 
V35 
S47 
D50 
C51 
R53 
H39 



236 

Table 36, continued. 
Angstrons between C^eta^- 
Cbeta added to each Glycine. 

P9 TIL K15 A16 118 R20 F22 



N24 



13.8 11.4 
15.6 11.2 



20 
8 
16 
17 



15.7 
5.6 
15.4 
17.3 
9.1 
7.7 
5.4 
6.3 5.6 
23 .9 24 .0 



9" 
7 
5 
2 

4.7 
5.5 
7.1 



7.2 
16.4 
14.9 
12.2 
8.0 
4 . 1 
8.4 
12. 1 
10.6 
4.2 
7.8 
15.3 
14 .7 
11.0 
17.9 
13.0 



9.5 

9.8 6.2 

9.5 10.4 4.9 

9.4 !■: . 9 10 . 6 6.2 
10.6 19.1 16.3 12.7 6.9 

15.3 24.1 21.9 13 . 2 12.7 6.6 
18.6 27.9 26.6 23.3 IB. 1 11.6 
16.6 24.1 20.2 15.7 9.8 6.8 

7, 1 15.0 12.3 9.6 6.1 5.6 

5.8 11.0 7.6 4.9 

18.5 23.1 17.6 12.8 

18.6 24.2 19 .2 14 .7 

16.4 23.5 19. 2 l4 . 6 
23.1 29.6 24,8 20.3 15.0 13.8 15.5 

9.5 12. 0 11.8 12. 5 12.8 14 .7 20. S 



5.9 
6.9 
9.3 

4.3 3.8 14.8 
9.1 12.0 15. 3 
9.9 11.0 14 . 7 
8.7 6.9 9.6 



K2 6 C3'0 

C30 12.4 

F33 13.9 10.1 

V35- 19.5 13.5 6.4 

S47 21.0 8.8 13.5 13.2 

D50 20.1 8.6 14,3 13.7 

C51 15.0 3.7 10.9 12.5 

R53 19.9 9.9 18.2 13.8 



F33 Y35 S47 050 C51 R53 



5.0 
6.9 
^.4 



5.2 
5.3 



R39 24.3 20,6 14.4 9.6 20.4 19.0 



4 

8 23.4 







{ 







Taoic 


37 : 


vqOt;A t 


c va 


* y L I 


set 


» 2 . 1 










g 


1 P 


1 ^ 


1 k 


I a 1 X 


1 












35 


36 


37 


1 " 


391 40 


! 






5'- 


ICAC 


1 CCT 


GCG 


CCC 


ITGC 


' AAA 


\cCC\r:ty. 




20S 






1 Spacer " 


Aoa r 


1 










i 


+ 

X 


r 


■ y 


f 


1 y 


1 n 


1 a 


1 k 1 






41 


42 


43 




45 


1 46 


1 47| 43 


49 






ATC 




CCT 


TAT 


TTC 


'TAC 


AAC 




*. *. : 




■ 23 5 














/ : 


' = olig 


= 27 


72- nts 


+ 


1 


+■ 








i 












g 


>s 


c 


q 


t 


s. 


V 


y ! g 


g 




50 


51 


52 


53 


54 


55 




57 


53 i 59 


60 




qfk 


GGt 




TCC 


CAG 


ACC 


I TTc 


qfV. 


TAC I CCT 


CCT 


268 


oiigj 28 = 


3 ' - 


acg 


gtc 


tgg 


aag 


* 


dtg cca 


cca 




78 nts 




















Overlap = 


' 12 


(7 CG, 


5 AT) 










c 


r 


a 


k 


r 


n 


n 


: 1 








61 


62 


63 


64 


65 


66 


67 




..ill 






TCC 


CCT 


CCT 


AAC 


CCT 


AAC 


AAC 








295 


acg 


gca 


cga 


ttc 


gca 


ttg 


ttg 


.laa 


Ctt 








1 


ESD T 


1 














s 




e 




c 


m 












70 


7ll 


72 


d j 


74 


75 












TCT 


qfk|CAG 


CATj 


TCC 


ATC 










322 


age 


* *ni 


etc 


eta 


acg 


tac 


gca 


ccc 


acc -5' 














1 Soh rl spicE?r [ 







k = equal parts of T- and G; n = eq^jal partis 
q « (.26 T, .13 C, .26 A, ar;d .30 C) ; 
f = (.2,2 T, .16 C, .40 A, and .22 C) ; 
* = comp.lencnt of symbol above 



of C and A; 



Residue 40 
Possibilities 21 
Abundance x 10: 
of PP3D .768 
Produce = 1.77 x 10"^ 



42 50 
21 X 21 



52 57 
21 X 21 X 



X 21 X 21 X 
.271 ,459 .671 .COG 



21 = 8 . 6 X :o' 

459 



Parent 



1/(5.5 X 10') 



iQzct favored = 1/(4.2 x 10^) 



Least favored one-nn»ino-ac id substitution from PPBD present 
at 1 in 1.6 :< lo"' 




r.. 
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Table 38: Result of varying SQtiZ of BPTI 2.1 



p 

31 

ccc 



p 

32 
CCA 



y 

33 
TAT 



t. 
3^ 
ACT 



35 

ccc 



p 

36 
CCC 



c- 
37 
TCC 



k 

33 



P fl.M I 







Dra 




Pss 






i 


Q 


r 


y 


f 


y 


41 


42 


43 


44 


45 


46 


ATC 


CAG 


CGT 


TAT 


TTC 


TAC 


E 


9 


L 


c 


q 


t 


50 


51 


52 


53 


54 


55 


GAG 


GGC 


CTG 


TCC 


CAG 


ACC 


C 


r 


a 


Jc 


r 


n 


61 


62 


63 


64 


65 


66 


TCC 


CCT 


CCT 


AAG 


CCT 


k/xC 






ESD I 


1 




s 


W 


e 


d 


c 


m 


70 


71 


72 


73 


74 


75 


TCC 


TGG 


CAA 


GAT 


TCC 


ATG 



n 

47 
AAC 



f 

56 
TTT 



n 

67 
AAC 



a 

<3 
CCT 



S 

57 
TCG 



1 


e 


29 


30 


CTC 


GAG 


Ava T 


xho r 


a 


D 


39 


40 


CCC 


CAT 


Jc 




49 




AAA 




y 


g 


58 


59 


TAC 


GOT 



178 



208 



60 
CCT 



f t k 

63 69 

rrr AAA 



1 sp^^ Tl 



r I t I c 
76| 77 78 
CCT I ACC I TCC 



*3 

vol 

GCTJ 



235 
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Table 39: vgDNA to vary set»2 BPTI 2.2 
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Overlap- = 15 (11 CG, 4 AT) 
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age acc 



k = equal pares of T and G; v - equal pares of C, A, and C; 

n ~ equal parts of C and A; r = equal parts of A and C; 

w = equal parts of A and T; 

q ~ (.26 T, .Id C, .26 A, and .30 G) ; 

f = (.22 T, .16 C, .40 A. .^nd .22 C) ; 

* - complement of syabol above 



Res icfue 
Possibilities 



33 4 1 
4 X 4 X 



43 
9 



4 4 
2 



51 

21 X 



54 

21 



Abundance x 
Product = 2.3 



10 2.5 2.5 
10-8 



.833 5. 



.663 .397 



55 

21 X 
= 6,2 
. 4 37 , 



72 
21 

602 



Parent = 1/(4.4 x 10') 
Least favored one-an ino-acid 
at 1 in 1.2 x lo'' 



loar.t favored 1/(1.25 x 10^) 

ubGtitution t re- PPBO \}rcr.er.\ 
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Table 40: Result of varying setit2 of BPTI 2.2 
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Table 41: vg DNA seti2 of BPTI 2.3 
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67 nts oligi34 3'- g acg ttg cqg 
overlap = 13 (7 CG , 6 AT) 
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n acq cca cca 



OV^t^ i.^i—w --.-J— J - - 

= equal parts of T and G; n = equal parts of C and A; 
= e^al parts of A and T; n - equa parts of A C,G,T. 
= equi.1 parts A,C,T; v = equal pares A,C,G. . 

- (.2^ T, .18 C, .26 A, and .30 C) ; 
~ (.22 T, .16 C, .40 A, and .22 G) ; 
* = complement of syn?30l above 

Residue 32 3* 40 A« '° ^ 21 x II = 

Possibilities 6X 6 X 21 X 6x 3X 5x21x^21^^^ 

Abundance x 10 , „ - ->n/a - 701 

of PPBD 10/6 10/6 .545 10/6 10/3 30/8 ..59 .701 

product = 1.01 X 10*^ 

^^r-^nr - 1/M X lo'') least favored = l/(-J x 10^) 

?east favored pne-aiino-acid substitution froa PPBD present 
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A method of obtaining a protein that binds a 
predetennined target that comprises: 

a) preparing a variegated population of replicable 
genetic packages, each package including. a nucleic 
acid construct coding on expression for an outer- 
surface-displayed potential binding protein 
conprising (i) a structural signal directing the 
display of the protein on the outer surface of the 
package and (ii) a potential binding domain for 
binding said target, uhere a plurality of 
different potential binding domains are displayed 
by said population, 

b) causing the expression of said proteins and the 
display of said proteins on the outer surface of 
such packages, 

c) contacting the packaoes vith target material so 
that the potential binding domains of the proteins 
and the target material say interact, and 
separating packages bearing a bindir.g domain that 
binds target material from packages that do not so 
bind, and 

d, recovering and replicating at least one package 
bearing a successful binding domain. 

The method of claim 1 wherein the population of 
replicable genetic packages of step (a) is 
obtained by: 



populat ion 
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A method of obtaining a protein that binds a 
predetermined target that comprises: 
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a) preparing a variegated population of replicable 
genetic packages, each package including. a nucleic 
acid construct coding on expression for an outer- 
surface-displayed potential binding protein 
comprising (i) a structural _ signal directing the 
display of the protein on the outer surface of the 
package and {ii) a potential binding domain for 
binding said target, where a plurality of 
different potential binding domains are displayed 
by said population, 

b) causing the expressicri of said proteins and the 
display of said proteins on the outer surface of 
such packages, 

c) contacting the packages vith target material so 
that the potential binding domains of the proteins 
and the target material icay interact, and 
separating packages bearing a bindir.g domain that 
binds target material from packages t-^.at do not so 
bind, and 

ci, recovering and replicating at least one package 
be«\rinq a successful binding donain. 



method of claim 1 wherein the population 



replicable genetic 
obtained by: 



packages 
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certain predetermined degree of affinity for 
target, material, and the required degree of 
affinity is increased for each new variegated 
population. 

7. The method of claim 1 wherein the displayable 
potential binding protein is a chimeric protein. 

8. The method of clain 7 wherein said signal is 
provided by a segment of said chimeric protein 
which is essentially identical in amino acid 
sequence with at least a functional portion of a 
natural outer surface protein encoded by said 
genetic package or a cell naturally infected by 
said genetic package, said portion directing the 
transport of said chimeric protein to the outer 
surface of the genetic package. 

9. The method of claim 2 wherein the second sequence 
is obtained by operably linking a DNA sequence 
encoding a potential outer surface transport 
signal to a DtiA sequence expressing a protein thar 
confers a selectable phcnotype to obtain a test 
construct, ' introducing the test constructs incc 
suitable hosts, causing expression of said DNA 

• construct, selecting genetic packages that display 
the protein that confers the selectable phcnotype 
on their outer surface, and choosing as said 
second sequence the DNA sequence encoding the 
potential outer surface transport signal of one of 
such selected genetic packages; wherein the 
potential outer surface transport signals encoded 
by the individual test constructs are non- 
identical. 
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certain predeterroined degree of affinity for 

target, material, and the required degree of 

affinity is increased for each new variegated 
population. 



The method of claira 1 vherein the" displayable 
potential binding protein is a chimeric protein. 





8. The method of claim 7 wherein said signal is 
10 provided by a segment of said chimeric protein 

which is essentially identical in amino acid 
sequence with at least a functional portion of a 
natural outer surface protein encoded by said 
genetic package or a cell naturally infected by 
15 said genetic package, said portion directing the 

transport of said c!ii.-::sric protein to the outer 
surface of the genetic package. 

9. The nethod of claim 2 wherein the second sequence 
20 is obtained by operably linking a DNA sequence 

encoding a potential outer surface transport 
signal to a DriA sequence expressing a protein chat 
confers a selectable phenotype to obtain a test 
construct, 'introducing the test constructs into 

25 suitable hosts, causing expression of said OUA 

' construct, selecting genetic packages that display 
the protein that confers the selectable phenotype 
on their outer surface, and choosing as said 
second sequence the ONA sequence encoding the 

30 potential outer surface transport signal of one of 

such selected genetic packages; wherein the 
potential outer surface transport signals encoded 
by the individual test constructs are non- 
identical. 
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17. The method of clain 3 in which the binding donain 
of the known protein has a known sequence of amino 
acids, and the identity and spatial relationship 
of the aaino acids foraing a surface of said 

. 5 domain is known. 

18, The nethod of clain 3, said target naterial 
comprising one or acre discrete nolecules, said 
parental potential binding domain being 

10 characterized as a sequence of amino acids, 

further coraprising identifying an interaction set 
.of amino acids which are on the surface of the 
parental potential binding domain and which can 
all sinultaneously touch a single nolecule of the 

15 target material, and obtaining potential binding 

domains by substituting a different anino acid for 
one or nore of the aaino acids in said interaction 
set. 

20 19. The nethod of clain 3 wherein the level of 
variegation of the population is chosen such that 
the packages displaying potential binding domains 
obtained by single anino acid substitutions in the 
• amino acid sequence of the parental potential 

25 ^ binding dcr.ain are present in detectable amounts. 

20. The method of claia 3 wherein the amino acid 
substitutions to be made are chosen after 
consideration of the 3D structure of the parental 
30 potential binding domain. 



21. The method of clai.-3 15 wherein the amino acid 
substitutions to be made are for amino acids of 
the chosen domain of the known protein which are 
known to be alterable without reducing the melting 
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affinity separated and retain viability. 



30. The method of claim 3 in which the initially 
chosen parental potential binding protein has at 

5 least one stable binding domain and said' dona in 

has a melting point of at least 60°C and is stable 
over a pH range of at least 3.0-8.0. 

31. The method of claim 15 wherein the known binding 
10 protein is an enzyme, the activity of which has a 

deleterious effect on the replicable genetic 
package, the host of the replicable genetic 
package, or the target, wherein the majority of 
the nucleic acid constructs code on expression or 
15 an analogue of the known binding protein that does 

not have such enzymatic activity. 

32. The method of claim 1 wherein the target contains 
ionizable groups and the pH of the solutions of 

20 the intended use and the pH of the affinity 

separations are chosen so that both the potential 
binding protein and the target remain stable, 

33. The method of claim 1 wherein the target contains 
25 - ionizable groups, further comprising providing 

counter ions in affinity separations and the 
solutions of the intended use to reduce 
electrostatic repulsion between the potential 
binding protein and the target. 

30 

34. The method of claim 1 wherein the initial 
potential binding domain is picked so that, under 
the conditions of intended use of the desired 
binding protein and under the conditions of 

35 affinity separation, that the potential binding 
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affinity separated and retain viability. 

'30. The method of claim 3 in which the initially 
chosen parental potential binding protein has at 
5 least one stable binding domain and said" domain 

has a melting point of at least 60°C and is stable 
over a pH range of at least 3.0-8.0. 

31. The method of claim 15 wherein the known binding 
10 protein is an enzyme, the activity of which has a 

deleterious effect on the replicable genetic 
package, the' host of the replicable genetic 
package, or the target, wherein the najority of 
the nucleic acid constructs code on expression or 
15 an analogue of the known binding protein thac does 

not have such enzymatic activity. 
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32. The method of claim 1 wherein the target contains 
ionizable groups and the pH of the solutions of 

20 the intended use and the pH of the affi.iity 

separations are chosen so that both the potential 
binding protein and the target remain stable. 

33. The method of claim 1 wherein the . target contains 
25 - ionizable groups, further comprising providing 

counter ions in affinity separations and the 
solutions of the intended use to reduce 
electrostatic repulsion between the potential 
binding protein and the target. 

The method of claim 1 wherein the initial 
potential binding domain is picked so that, under 
the conditions of intended use of the desired 
binding protein and under the conditions of 
35 affinity separation, that the potential binding 
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thereof embodying an outer surface transport 
signal : 

The method of claim 42 wherein the signal is 
provided by the gene III protein . of MI3 or. a 
segment thereof embodying an outer surface 
transport signal , 



46. The method of claim 3 vherein the initially chosen 
10 parental potential binding domain is at least 50% 

homologous uith the binding domain of bovine 
pancreatic tr^'psin inhibitor, having the residues 
C5, C12, C30, F33, C37, C5 1 and C55. 
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47. The raethod of claim 46 further specifying that: a) 
residue 21 contains ere of the amino acids Y, 

W, or' I; b) residue 23 contains one of the omino 
acids Y or F; c) residue 35 contains, one of the 
residues Y, F, or w; d) residue 40 contains one of 
t!:e amino acids G or A; e; residue 45 contains 
either F or Y. 

48. The method of claim 47 vherein the residues to be 
varied are chosen from among residues 17, 19, 21, 
27, 28, 29, 31, 32, 34, 48, 49, and 52. 

49. The nuthod of claim A3 wr.erein the additional 
residurs 9, 11, 15, 16. 13, 20, 22 , 24, 26, 35, 
47, and 53 are allo-ed to vary. 

50. The method of claim 47 wherein the residues to 
vary are picked from one of the interaction sets 
identified in table 34. 



35 51. The method of claim 2 vherein the distribution of 
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thereof embodying an outer surface transport 
signal; 

The method of clain 42 wherein the signal is 
provrded. by the gene III protein . of M3.3 or a 
segment thereof embodying an outer surface 
transport signal. 

The method of clain 3 wherein the initially chosen 
parental potential binding doMin is at least 50% 
homologous with the binding domain of bovine 
pancreatic tr^'psin inhibitor, having the residues 
CS, C12, C30, F33, C37, C51 and C55. 

The method of claim 46 further specifying that: a) 
residue 21 contains ere of the aaino acids Y, 

or I; b) residue 23 contai.-is one of the amino 
acids Y or F; c) residue 35 contains, one of the 
residues Y, F, or W; d) rcsLJue 40 contains one of 
tl:e amino acids G or A; e) residue 45 contains 
either F or Y. 



48. The method of clain 47 wherein the residues to be 
varied are chosen fro:^ aniong residues 17, 19, 21, 
27, 28, 29, 31, 32, 34, 48, 49, and 52. 

49. The method of clain 43 vr.erein the additional 
residues 9. 11, 15, 16, 13, 20, 22, 24, 26, 35, 
47, and 53 are allo-ed to vary. 

50. The method of clain 47 wherein the residues to 
vary are pierced frcn one of tne interaction sets 
identified in table 34. 



35 51. The method of claim 2 wherein the distribution of 
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insensitive to UV, tolerant of desiccation, and 
resistant to a pH of 2.0 to 10. o. 

58. The method of claim 1 wherein the genetic packages 
nay be frozen and later revived. 



r 



59. The metj.od of claim 1 wherein the genetic pacV.age 
is a cell with a doubling tine of 20-40 minutes. 

60. The method of clain 1 wherein the genetic package 
is . a virus vith a burst size of at least 
100/infectod cell. 



61. The nethod of clain 1 wherein the genetic packages 
15 are harvested by centri f uga tion without loss of 

viability. 



62. 
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The nethod of claim 3 wherein the initially chosen 
parental potential binding doinain is selected froa 
the group consisting of (a) binding domains of 
bovine pancreatic trypsin inhibitor, cranbin, 
ovoDucoid, T4 tysozyne, hen egg white lysozyz:e, 
ribonuclease , and azurin, and (b) domains at least 
S0\ homologous with any of the foregoing domains 
and which have a melting point of at least 60°C. 



63. The method of clain 36 wherein the outer surface 
transport signal is provided by the lam3 protein 
or a segment thereof embodying an outer surface 
transport signal. 
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64, The method of claim 38 wherein the outer surface 
transport signal is provided by the cotA, cotS, 
cote or cotD protein or a segment thereof 
embodying an outer surface transport signal. 
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insensitive to UV, tolerant of desiccation, and 
resistant to a pH of 2.0 to 10.0. 

The method of claim 1 wherein the genetic packaqcs 
may be frozen and later revived. 



59. The method of claim I wherein the genetic package 
is a cell with a doubling time of 20-40 minutes- 

60. The method of claim 1 wherein the genetic package 
is a virus with a burst size of at least 
100/infectcd cell. 

61. The method of claim 1 wherein the genetic packages 
are harvested by ccntr i f ugation without loss of 
viabi 1 ity . 

62. The method of claim 3 wherein the initially chosen 
parental potential binding domain is selected t von 
the group consisting of (a) binding domains of 
bovine pancreatic trypsin inhibitor, crambin, 
ovomucoid, T4 lysozyne, hen egg white lysozyme, 
ribonuclease , and azurin, and (b) domains at least 
50\ homologous with any of the foregoing domains 
and which have a melting point of zt least 60°C. 

63. The method of claim 36 wherein the outer surface 
transport signal is provided by the lamB protein 
or a segment thereof embodying an outer surface 
transport signal. 

64. The method of claim 38 wherein the outer surface 
transport signal is provided by the cotA, cotB, 
cote or cotO protein or a segment thereof 
embodying an outer surface transport signal. 





( 




1 



3 



0 



72. 



350 

is further chosen to yield the largest value for 
the quantity (( 1 .-abundance (stop codons) ) times 
(abundance of the least abundant amino 
■ acid) / (abundance of the nost abundant amino 
acid) ) . 

The protein of claim 66, wherein the protein 
comprises a first foreign domain recognizing a 
first target material and a second ' foreign domain 
recognizing a second target material.- 
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is further chosen to yield the largest value for 
the quantity {( 1 .-abundance (stop codons)) times 
(abundance of the least abundant amino 
acid) / (abundance of the most abundant amino 
acid) ) . 

The protein of claim 66, wherein the protein 
comprises a first foreign domain recognizing a 
first target material and a second foreign domain 
recognizing a second target material. 
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